back to indexDmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153
link |
The following is a conversation with Dmitry Korkin,
link |
his second time in the podcast.
link |
He's a professor of bioinformatics
link |
and computational biology at WPI,
link |
where he specializes in bioinformatics of complex disease,
link |
computational genomics, systems biology,
link |
and biomedical data analytics.
link |
He loves biology, he loves computing,
link |
plus he is Russian and recites a poem in Russian
link |
at the end of the podcast.
link |
What else could you possibly ask for in this world?
link |
Quick mention of our sponsors.
link |
Brave Browser, NetSuite Business Management Software,
link |
Magic Spoon Low Carb Cereal,
link |
and 8sleep Self Cooling Mattress.
link |
So the choice is browsing privacy, business success,
link |
healthy diet, or comfortable sleep.
link |
Choose wisely, my friends,
link |
and if you wish, click the sponsor links below
link |
to get a discount and to support this podcast.
link |
As a side note, let me say that to me,
link |
the scientists that did the best apolitical,
link |
impactful, brilliant work of 2020
link |
are the biologists who study viruses without an agenda,
link |
without much sleep, to be honest,
link |
just a pure passion for scientific discovery
link |
and exploration of the mysteries within viruses.
link |
Viruses are both terrifying and beautiful.
link |
Terrifying because they can threaten
link |
the fabric of human civilization,
link |
both biological and psychological.
link |
Beautiful because they give us insights
link |
into the nature of life on Earth
link |
and perhaps even extraterrestrial life
link |
of the not so intelligent variety
link |
that might meet us one day
link |
as we explore the habitable planets
link |
and moons in our universe.
link |
If you enjoy this thing, subscribe on YouTube,
link |
review it on Apple Podcast, follow on Spotify,
link |
support on Patreon, or connect with me on Twitter
link |
And now here's my conversation with Dmitry Korkin.
link |
It's often said that proteins
link |
and the amino acid residues that make them up
link |
are the building blocks of life.
link |
Do you think of proteins in this way
link |
as the basic building blocks of life?
link |
So the proteins indeed is the basic unit,
link |
biological unit that carries out
link |
important function of the cell.
link |
However, through studying the proteins
link |
and comparing the proteins across different species,
link |
across different kingdoms,
link |
you realize that proteins are actually
link |
much more complicated.
link |
So they have so called modular complexity.
link |
And so what I mean by that is an average protein
link |
consists of several structural units.
link |
So we call them protein domains.
link |
And so you can imagine a protein as a string of beads
link |
where each bead is a protein domain.
link |
And in the past 20 years,
link |
scientists have been studying
link |
the nature of the protein domains
link |
because we realize that it's the unit.
link |
Because if you look at the functions, right?
link |
So many proteins have more than one function
link |
and those protein functions are often carried out
link |
by those protein domains.
link |
So we also see that in the evolution,
link |
those proteins domains get shuffled.
link |
So they act actually as a unit.
link |
Also from the structural perspective, right?
link |
So some people think of a protein
link |
as a sort of a globular molecule,
link |
but as a matter of fact,
link |
is the globular part of this protein is a protein domain.
link |
So we often have this, again,
link |
the collection of this protein domains
link |
align on a string as beads.
link |
And the protein domains are made up of amino acid residue.
link |
So we're talking about.
link |
So this is the basic,
link |
so you're saying the protein domain
link |
is the basic building block of the function
link |
that we think about proteins doing.
link |
So of course you can always talk
link |
about different building blocks.
link |
It's turtles all the way down.
link |
But there's a point where there is,
link |
at the point of the hierarchy
link |
where it's the most, the cleanest element block
link |
based on which you can put them together
link |
in different kinds of ways to form complex function.
link |
And you're saying protein domains,
link |
why is that not talked about as often in popular culture?
link |
Well, there are several perspectives on this.
link |
And one of course is the historical perspective, right?
link |
So historically scientists have been able
link |
to structurally resolved
link |
to obtain the 3D coordinates of a protein
link |
for smaller proteins.
link |
And smaller proteins tend to be a single domain protein.
link |
So we have a protein equal to a protein domain.
link |
And so because of that,
link |
the initial suspicion was that the proteins are,
link |
they have globular shapes
link |
and the more of smaller proteins you obtain structurally,
link |
the more you became convinced that that's the case.
link |
And only later when we started having
link |
alternative approaches.
link |
So the traditional ones are X ray crystallography
link |
and NMR spectroscopy.
link |
So this is sort of the two main techniques
link |
that give us the 3D coordinates.
link |
But nowadays there's huge breakthrough
link |
in cryo electron microscopy.
link |
So the more advanced methods that allow us
link |
to get into the 3D shapes of much larger molecules,
link |
molecular complexes,
link |
just to give you one of the common examples
link |
for this year, right?
link |
So the first experimental structure
link |
of a SARS COVID 2 protein
link |
was the cryo EM structure of the S protein.
link |
So the spike protein.
link |
And so it was solved very quickly.
link |
And the reason for that is the advancement
link |
of this technology is pretty spectacular.
link |
How many domains does the, is it more than one domain?
link |
Oh yes, I mean, so it's a very complex structure.
link |
And we, you know, on top of the complexity
link |
of a single protein, right?
link |
So this structure is actually is a complex, is a trimer.
link |
So it needs to form a trimer in order to function properly.
link |
So a complex is a glomeration of multiple proteins.
link |
And so we can have the same protein copied in multiple,
link |
you know, made up in multiple copies
link |
and forming something that we called a homo oligomer.
link |
Homo means the same, right?
link |
So in this case, so the spike protein is the,
link |
is an example of a homo tetram, homo trimer, sorry.
link |
So you need three copies of it?
link |
We have these three chains,
link |
the three molecular chains coupled together
link |
and performing the function.
link |
That's what, when you look at this protein from the top,
link |
you see a perfect triangle.
link |
So, but other, you know,
link |
so other complexes are made up of, you know,
link |
different proteins.
link |
Some of them are completely different.
link |
Some of them are similar.
link |
The hemoglobin molecule, right?
link |
So it's actually, it's a protein complex.
link |
It's made of four basic subunits.
link |
Two of them are identical to each other.
link |
Two other identical to each other,
link |
but they are also similar to each other,
link |
which sort of gives us some ideas about the evolution
link |
of this, you know, of this molecule.
link |
And perhaps, so one of the hypothesis is that, you know,
link |
in the past, it was just a homo tetramer, right?
link |
So four identical copies,
link |
and then it became, you know, sort of modified,
link |
it became mutated over the time
link |
and became more specialized.
link |
Can we linger on the spike protein for a little bit?
link |
Is there something interesting
link |
or like beautiful you find about it?
link |
I mean, first of all,
link |
it's an incredibly challenging protein.
link |
And so we, as a part of our sort of research
link |
to understand the structural basis of this virus,
link |
to sort of decode, structurally decode,
link |
every single protein in its proteome,
link |
which, you know, we've been working on this spike protein.
link |
And one of the main challenges was that the cryoEM data
link |
allows us to reconstruct or to obtain the 3D coordinates
link |
of roughly two thirds of the protein.
link |
The rest of the one third of this protein,
link |
it's a part that is buried into the membrane of the virus
link |
and of the viral envelope.
link |
And it also has a lot of unstable structures around it.
link |
So it's chemically interacting somehow
link |
with whatever the hex is connecting to.
link |
Yeah, so people are still trying to understand.
link |
So the nature of, and the role of this one third,
link |
because the top part, you know, the primary function
link |
is to get attached to the ACE2 receptor, human receptor.
link |
There is also beautiful mechanics
link |
of how this thing happens, right?
link |
So because there are three different copies of this chains,
link |
you know, there are three different domains, right?
link |
So we're talking about domains.
link |
So this is the receptor binding domains, RBDs,
link |
that gets untangled and get ready to get attached
link |
And now they are not necessarily going in a sync mode.
link |
As a matter of fact.
link |
It's asynchronous.
link |
So yes, and this is where another level of complexity
link |
comes into play because right now what we see is,
link |
we typically see just one of the arms going out
link |
and getting ready to be attached to the ACE2 receptors.
link |
However, there was a recent mutation
link |
that people studied in that spike protein.
link |
And very recently, a group from UMass Medical School
link |
will happen to collaborate with groups.
link |
So this is a group of Jeremy Lubin
link |
and a number of other faculty.
link |
They actually solve the mutated structure of the spike.
link |
And they showed that actually, because of these mutations,
link |
you have more than one arms opening up.
link |
And so now, so the frequency of two arms going up
link |
increase quite drastically.
link |
Does that change the dynamics somehow?
link |
It potentially can change the dynamics
link |
because now you have two possible opportunities
link |
to get attached to the ACE2 receptor.
link |
It's a very complex molecular process, mechanistic process.
link |
But the first step of this process is the attachment
link |
of this spike protein, of the spike trimer
link |
to the human ACE2 receptor.
link |
So this is a molecule that sits
link |
on the surface of the human cell.
link |
And that's essentially what initiates,
link |
what triggers the whole process of encapsulation.
link |
If this was dating, this would be the first date.
link |
So is it possible to have the spike protein
link |
just like floating about on its own?
link |
Or does it need that interactability with the membrane?
link |
Yeah, so it needs to be attached,
link |
at least as far as I know.
link |
But when you get this thing attached on the surface,
link |
there is also a lot of dynamics
link |
on how it sits on the surface.
link |
So for example, there was a recent work in,
link |
again, where people use the cryolectron microscopy
link |
to get the first glimpse of the overall structure.
link |
It's a very low res, but you still get
link |
some interesting details about the surface,
link |
about what is happening inside,
link |
because we have literally no clue until recent work
link |
about how the capsid is organized.
link |
So a capsid is essentially,
link |
it's the inner core of the viral particle
link |
where there is the RNA of the virus,
link |
and it's protected by another protein, N protein,
link |
that essentially acts as a shield.
link |
But now we are learning more and more,
link |
so it's actually, it's not just this shield,
link |
it potentially is used for the stability
link |
of the outer shell of the virus.
link |
So it's pretty complicated.
link |
And I mean, understanding all of this is really useful
link |
for trying to figure out like developing a vaccine
link |
or some kind of drug to attack,
link |
any aspects of this, right?
link |
So, I mean, there are many different implications to that.
link |
First of all, it's important to understand
link |
the virus itself, right?
link |
So in order to understand how it acts,
link |
what is the overall mechanistic process
link |
of this virus replication,
link |
of this virus proliferation to the cell, right?
link |
So that's one aspect.
link |
The other aspect is designing new treatments.
link |
So one of the possible treatments
link |
is designing nanoparticles.
link |
And so some nanoparticles that will resemble the viral shape
link |
that would have the spike integrated,
link |
and essentially would act as a competitor to the real virus
link |
by blocking the ACE2 receptors,
link |
and thus preventing the real virus entering the cell.
link |
Now, there are also, you know,
link |
there is a very interesting direction
link |
in looking at the membrane,
link |
at the envelope portion of the protein
link |
and attacking its M protein.
link |
So there are, you know, to give you a, you know,
link |
sort of a brief overview,
link |
there are four structural proteins.
link |
These are the proteins that made up
link |
a structure of the virus.
link |
So SPIKE, S protein that acts as a trimer,
link |
so it needs three copies.
link |
E, envelope protein that acts as a pantomime,
link |
so it needs five copies to act properly.
link |
M is a membrane protein, it forms dimers,
link |
and actually it forms beautiful lattice.
link |
And this is something that we've been studying
link |
and we are seeing it in simulations.
link |
It actually forms a very nice grid
link |
or, you know, threads, you know,
link |
of different dimers attached next to each other.
link |
Just a bunch of copies of each other,
link |
and they naturally, when you have a bunch of copies
link |
of each other, they form an interesting lattice.
link |
And, you know, if you think about this, right?
link |
So this complex, you know, the viral shape
link |
needs to be organized somehow, self organized somehow, right?
link |
So it, you know, if it was a completely random process,
link |
you know, you probably wouldn't have the envelope shell
link |
of the ellipsoid shape, you know,
link |
you would have something, you know,
link |
pretty random, right, shape.
link |
So there is some, you know, regularity
link |
in how this, you know, how this M dimers
link |
get to attach to each other
link |
in a very specific directed way.
link |
Is that understood at all?
link |
It's not understood.
link |
We are now, we've been working in the past six months
link |
since, you know, we met, actually,
link |
this is where we started working on trying to understand
link |
the overall structure of the envelope
link |
and the key components that made up this, you know,
link |
Wait, does the envelope also have the lattice structure
link |
So the envelope is essentially is the outer shell
link |
of the viral particle.
link |
The N, the nucleocapsid protein,
link |
is something that is inside.
link |
But get that, the N is likely to interact with M.
link |
Does it go M and E?
link |
Like, where's the E and the M?
link |
So E, those different proteins,
link |
they occur in different copies on the viral particle.
link |
So E, this pentamer complex,
link |
we only have two or three, maybe, per each particle, okay?
link |
We have thousand or so of M dimers
link |
that essentially made up,
link |
that makes up the entire, you know, outer shell.
link |
So most of the outer shell is the M.
link |
And the M protein.
link |
When you say particle, that's the virion,
link |
the virus, the individual virus.
link |
It's a single, yes.
link |
Single element of the virus, it's a single virus.
link |
Single virus, right.
link |
And we have about, you know, roughly 50 to 90 spike trimmers.
link |
So when you, you know, when you show a...
link |
Per virus particle.
link |
Per virus particle.
link |
Sorry, what did you say, 50 to 90?
link |
So this is how this thing is organized.
link |
And so now, typically, right,
link |
so you see these, the antibodies that target,
link |
you know, spike protein,
link |
certain parts of the spike protein,
link |
but there could be some, also some treatments, right?
link |
So these are, you know, these are small molecules
link |
that bind strategic parts of these proteins,
link |
disrupting its function.
link |
So one of the promising directions,
link |
it's one of the newest directions,
link |
is actually targeting the M dimer of the protein.
link |
Targeting the proteins that make up this outer shell.
link |
Because if you're able to destroy the outer shell,
link |
you're essentially destroying the viral particle itself.
link |
So preventing it from, you know, functioning at all.
link |
So that's, you think is,
link |
from a sort of cyber security perspective,
link |
virus security perspective,
link |
that's the best attack vector?
link |
Is, or like, that's a promising attack vector?
link |
I would say, yeah.
link |
So, I mean, there's still tons of research needs to be,
link |
you know, to be done.
link |
But yes, I think, you know, so.
link |
There's more attack surface, I guess.
link |
More attack surface.
link |
But, you know, from our analysis,
link |
from other evolutionary analysis,
link |
this protein is evolutionarily more stable
link |
compared to the, say, to the spike protein.
link |
Oh, and stable means a more static target?
link |
Well, yeah, so it doesn't change.
link |
It doesn't evolve from the evolutionary perspective
link |
so drastically as, for example, the spike protein.
link |
There's a bunch of stuff in the news
link |
about mutations of the virus in the United Kingdom.
link |
I also saw in South Africa something.
link |
Maybe that was yesterday.
link |
You just kind of mentioned about stability and so on.
link |
Which aspects of this are mutatable
link |
and which aspects, if mutated, become more dangerous?
link |
And maybe even zooming out,
link |
what are your thoughts and knowledge and ideas
link |
about the way it's mutated,
link |
all the news that we've been hearing?
link |
Are you worried about it from a biological perspective?
link |
Are you worried about it from a human perspective?
link |
So, I mean, you know, mutations are sort of a general way
link |
for these viruses to evolve, right?
link |
So, it's, you know, it's essentially,
link |
this is the way they evolve.
link |
This is the way they were able to jump
link |
from one species to another.
link |
We also see some recent jumps.
link |
There were some incidents of this virus jumping
link |
from human to dogs.
link |
So, you know, there is some danger in those jumps
link |
because every time it jumps, it also mutates, right?
link |
So, when it jumps to the species
link |
and jumps back, right?
link |
So, it acquires some mutations
link |
that are sort of driven by the environment
link |
of a new host, right?
link |
And it's different from the human environment.
link |
And so, we don't know whether the mutations
link |
that are acquired in the new species
link |
are neutral with respect to the human host
link |
or maybe, you know, maybe damaging.
link |
Yeah, change is always scary, but so are you worried about,
link |
I mean, it seems like because the spread is,
link |
during winter now, seems to be exceptionally high
link |
and especially with a vaccine just around the corner
link |
already being actually deployed,
link |
is there some worry that this puts evolutionary pressure,
link |
selective pressure on the virus for it to mutate?
link |
Is that a source of worry?
link |
Well, I mean, there is always this thought
link |
in the scientist's mind, you know, what will happen, right?
link |
So, I know there've been discussions
link |
about sort of the arms race between the ability
link |
of the humanity to get vaccinated faster
link |
than the virus, you know, essentially, you know,
link |
it becomes, you know, resistant to the vaccine.
link |
I mean, I don't worry that much simply because,
link |
you know, there is not that much evidence to that.
link |
To aggressive mutation around the vaccine.
link |
Exactly, you know, obviously there are mutations
link |
around the vaccine, so the reason we get vaccinated
link |
every year against the seasonal mutations, right?
link |
But, you know, I think it's important to study it.
link |
So, I think one of the, you know, to me,
link |
and again, I might be biased because, you know,
link |
we've been trying to do that as well,
link |
so, but one of the critical directions
link |
in understanding the virus is to understand its evolution
link |
in order to sort of understand the mechanisms,
link |
the key mechanisms that lead the virus to jump,
link |
you know, the Nordic viruses to jump from species,
link |
from species to another, that the mechanisms
link |
that lead the virus to become resistant to vaccines,
link |
also to treatments, right?
link |
And hopefully that knowledge will enable us
link |
to sort of forecast the evolutionary traces,
link |
the future evolutionary traces of this virus.
link |
I mean, what, from a biological perspective,
link |
this might be a dumb question,
link |
but is there parts of the virus that if souped up,
link |
like through mutation, could make it more effective
link |
We're talking about this specific coronavirus
link |
because we were talking about the different, like,
link |
the membrane, the M protein, the E protein,
link |
the N and the S, the spike, is there some?
link |
And there are 20 or so more in addition to that.
link |
But is that a dumb way to look at it?
link |
Like, which of these, if mutated,
link |
could have the greatest impact, potentially damaging impact,
link |
on the effectiveness of the virus?
link |
So it's actually, it's a very good question
link |
because, and the short answer is, we don't know yet.
link |
But of course there is capacity of this virus
link |
to become more efficient.
link |
The reason for that is, you know,
link |
so if you look at the virus, I mean, it's a machine, right?
link |
So it's a machine that does a lot of different functions,
link |
and many of these functions are sort of nearly perfect,
link |
but they're not perfect.
link |
And those mutations can have the greatest impact
link |
and make those functions more perfect.
link |
For example, the attachment to ACE2 receptor, right,
link |
of the spike, right?
link |
So, you know, has this virus reached the efficiency
link |
in which the attachment is carried out?
link |
Or there are some mutations that still to be discovered,
link |
right, that will make this attachment sort of stronger,
link |
or, you know, something more, in a way more efficient
link |
from the point of view of this virus functioning.
link |
That's sort of the obvious example.
link |
But if you look at each of these proteins,
link |
I mean, it's there for a reason,
link |
it performs certain function.
link |
And it could be that certain mutations will, you know,
link |
enhance this function.
link |
It could be that some mutations will make this function
link |
much less efficient, right?
link |
So that's also the case.
link |
Let's, since we're talking about the evolutionary history
link |
of a virus, let's zoom back out
link |
and look at the evolution of proteins.
link |
I glanced at this 2010 Nature paper
link |
on the quote, ongoing expansion of the protein universe.
link |
And then, you know, it kind of implies and talks about
link |
that proteins started with a common ancestor,
link |
which is, you know, kind of interesting.
link |
It's interesting to think about like,
link |
even just like the first organic thing
link |
that started life on Earth.
link |
And from that, there's now, you know, what is it?
link |
3.5 billion years later, there's now millions of proteins.
link |
And they're still evolving.
link |
And that's, you know, in part,
link |
one of the things that you're researching.
link |
Is there something interesting to you about the evolution
link |
of proteins from this initial ancestor to today?
link |
Is there something beautiful and insightful
link |
about this long story?
link |
So I think, you know, if I were to pick a single keyword
link |
about protein evolution, I would pick modularity,
link |
something that we talked about in the beginning.
link |
And that's the fact that the proteins are no longer
link |
considered as, you know, as a sequence of letters.
link |
There are hierarchical complexities
link |
in the way these proteins are organized.
link |
And these complexities are actually going
link |
beyond the protein sequence.
link |
It's actually going all the way back to the gene,
link |
to the nucleotide sequence.
link |
And so, you know, again, these protein domains,
link |
they are not only functional building blocks,
link |
they are also evolutionary building blocks.
link |
And so what we see in the sort of,
link |
in the later stages of evolution,
link |
I mean, once this stable structurally
link |
and functionally building blocks were discovered,
link |
they essentially, they stay, those domains stay as such.
link |
So that's why if you start comparing different proteins,
link |
you will see that many of them will have similar fragments.
link |
And those fragments will correspond to something
link |
that we call protein domain families.
link |
And so they are still different
link |
because you still have mutations and, you know,
link |
the, you know, different mutations are attributed to,
link |
to, you know, diversification of the function
link |
of this, you know, protein domains.
link |
However, you don't, you very rarely see, you know,
link |
the evolutionary events that would split
link |
this domain into fragments because,
link |
and it's, you know, once you have the domain split,
link |
you actually, you, you know,
link |
you can completely cancel out its function
link |
or at the very least you can reduce it.
link |
And that's not, you know, efficient from the point of view
link |
of the, you know, of the cell functioning.
link |
So, so the, the, the protein domain level
link |
is a very important one.
link |
Now, on top of that, right?
link |
So if you look at the proteins, right,
link |
so you have this structural units
link |
and they carry out the function,
link |
but then much less is known about things
link |
that connect this protein domains,
link |
something that we call linkers.
link |
And those linkers are completely flexible, you know,
link |
parts of the protein that nevertheless
link |
carry out a lot of function.
link |
So it's like little tails, little heads.
link |
So, so, so we do have tails.
link |
So they're called termini, C and N termini.
link |
So these are things right on the, on, on, on one
link |
and another ends of the protein sequence.
link |
So they are also very important.
link |
So they, they attributed to very specific interactions
link |
between the proteins.
link |
But you're referring to the links between domains.
link |
That connect the domains.
link |
And, you know, apart from the, just the,
link |
the simple perspective, if you have, you know,
link |
a very short domain, you have, sorry, a very short linker,
link |
you have two domains next to each other.
link |
They are forced to be next to each other.
link |
If you have a very long one,
link |
you have the domains that are extremely flexible
link |
and they carry out a lot of sort of
link |
spatial reorganization, right?
link |
But on top of that, right, just this linker itself,
link |
because it's so flexible, it actually can adapt
link |
to a lot of different shapes.
link |
And therefore it's a, it's a very good interactor
link |
when it comes to interaction between this protein
link |
and other protein, right?
link |
So these things also evolve, you know,
link |
and they in a way have different sort of laws of
link |
the driving laws that underlie the evolution
link |
because they no longer need to,
link |
to preserve certain structure, right?
link |
Unlike protein domains.
link |
And so on top of that,
link |
you have something that is even less studied.
link |
And this is something that attribute to,
link |
to the concept of alternative splicing.
link |
So alternative splicing.
link |
So it's a, it's a very cool concept.
link |
It's something that we've been fascinated about for,
link |
you know, over a decade in my lab
link |
and trying to do research with that.
link |
But so, you know, so typically, you know,
link |
a simplistic perspective is that one gene
link |
is equal one protein product, right?
link |
So you have a gene, you know,
link |
you transcribe it and translate it
link |
and it becomes a protein.
link |
In reality, when we talk about eukaryotes,
link |
especially sort of more recent eukaryotes
link |
that are very complex,
link |
the gene is no longer equal to one protein.
link |
It actually can produce multiple functionally,
link |
you know, active protein products.
link |
And each of them is, you know,
link |
is called an alternatively spliced product.
link |
The reason it happens is that if you look at the gene,
link |
it actually has, it has also blocks.
link |
And the blocks, some of which,
link |
and it's essentially, it goes like this.
link |
So we have a block that will later be translated.
link |
Then we'll have a block that is not translated, cut out.
link |
We call it intron.
link |
So we have exon, intron, exon, intron,
link |
et cetera, et cetera, et cetera, right?
link |
So sometimes you can have, you know,
link |
dozens of these exons and introns.
link |
So what happens is during the process
link |
when the gene is converted to RNA,
link |
we have things that are cut out,
link |
the introns that are cut out,
link |
and exons that now get assembled together.
link |
And sometimes we will throw out some of the exons
link |
and the remaining protein product will become
link |
still be the same.
link |
So now you have fragments of the protein
link |
that no longer there.
link |
They were cut out with the introns.
link |
Sometimes you will essentially take one exon
link |
and replace it with another one, right?
link |
So there's some flexibility in this process.
link |
So that creates a whole new level of complexity.
link |
Is this random though?
link |
We, and this is where I think now the appearance
link |
of this modern single cell
link |
and before that tissue level sequencing,
link |
next generation sequencing techniques such as RNA seed
link |
allows us to see that these are the events
link |
that often happen in response.
link |
It's a dynamic event that happens in response
link |
to disease or in response
link |
to certain developmental stage of a cell.
link |
And this is an incredibly complex layer
link |
that also undergoes, I mean,
link |
because it's at the gene level, right?
link |
So it undergoes certain evolution, right?
link |
And now we have this interplay
link |
between what is happening in the protein world
link |
and what is happening in the gene and RNA world.
link |
And for example, it's often that we see
link |
that the boundaries of this exons coincide
link |
with the boundaries of the protein domains, right?
link |
So there is this close interplay to that.
link |
It's not always, I mean, otherwise it would be too simple,
link |
But we do see the connection
link |
between those sort of machineries.
link |
And obviously the evolution will pick up this complexity
link |
Select for whatever is successful,
link |
whatever is interesting function.
link |
We see that complexity in play
link |
and makes this question more complex, but more exciting.
link |
Small detour, I don't know if you think about this
link |
into the world of computer science.
link |
There's a Douglas Hostetter, I think,
link |
came up with the name of Quine,
link |
which are, I don't know if you're familiar
link |
with these things, but it's computer programs
link |
that have, I guess, exon and intron,
link |
and they copy, the whole purpose of the program
link |
is to copy itself.
link |
So it prints copies of itself,
link |
but can also carry information inside of it.
link |
So it's a very kind of crude, fun exercise of,
link |
can we sort of replicate these ideas from cells?
link |
Can we have a computer program that when you run it,
link |
just print itself, the entirety of itself,
link |
and does it in different programming languages and so on.
link |
I've been playing around and writing them.
link |
It's a kind of fun little exercise.
link |
You know, when I was a kid, so you know,
link |
it was essentially one of the sort of main stages
link |
in informatics Olympiads that you have to reach
link |
in order to be any so good,
link |
is you should be able to write a program
link |
that replicates itself.
link |
And so the task then becomes even sort of more complicated.
link |
So what is the shortest program?
link |
And of course, it's a function of a programming language,
link |
but yeah, I remember a long, long, long time ago
link |
when we tried to make it short and short
link |
and find the shortcut.
link |
There's actually on a stack exchange, there's a entire site
link |
called CodeGolf, I think,
link |
where the entirety is just the competition.
link |
People just come up with whatever task, I don't know,
link |
like write code that reports the weather today.
link |
And the competition is about whatever programming language,
link |
what is the shortest program?
link |
And it makes you actually, people should check it out
link |
because it makes you realize
link |
there's some weird programming languages out there.
link |
But just to dig on that a little deeper,
link |
do you think, in computer science,
link |
we don't often think about programs,
link |
just like the machine learning world now,
link |
that's still kind of basic programs.
link |
And then there's humans that replicate themselves, right?
link |
And there's these mutations and so on.
link |
Do you think we'll ever have a world
link |
where there's programs that kind of
link |
have an evolutionary process?
link |
So I'm not talking about evolutionary algorithms,
link |
but I'm talking about programs that kind of
link |
mate with each other and evolve
link |
and like on their own replicate themselves.
link |
So this is kind of the idea here is,
link |
that's how you can have a runaway thing.
link |
So we think about machine learning as a system
link |
that gets smarter and smarter and smarter and smarter.
link |
At least the machine learning systems of today are like,
link |
it's a program that you can like turn off,
link |
as opposed to throwing a bunch of little programs out there
link |
and letting them like multiply and mate
link |
and evolve and replicate.
link |
Do you ever think about that kind of world,
link |
when we jump from the biological systems
link |
that you're looking at to artificial ones?
link |
I mean, it's almost like you take the sort of the area
link |
of intelligent agents, right?
link |
Which are essentially the independent sort of codes
link |
that run and interact and exchange the information, right?
link |
So I don't see why not.
link |
I mean, it could be sort of a natural evolution
link |
in this area of computer science.
link |
I think it's kind of an interesting possibility.
link |
It's terrifying too,
link |
but I think it's a really powerful tool.
link |
Like to have like agents that, you know,
link |
we have social networks with millions of people
link |
and they interact.
link |
I think it's interesting to inject into that,
link |
was already injected into that bots, right?
link |
But those bots are pretty dumb.
link |
You know, they're probably pretty dumb algorithms.
link |
You know, it's interesting to think
link |
that there might be bots that evolve together with humans.
link |
And there's the sea of humans and robots
link |
that are operating first in the digital space.
link |
And then you can also think, I love the idea.
link |
Some people worked, I think at Harvard, at Penn,
link |
there's robotics labs that, you know,
link |
take as a fundamental task to build a robot
link |
that given extra resources can build another copy of itself,
link |
like in the physical space,
link |
which is super difficult to do, but super interesting.
link |
I remember there's like research on robots
link |
that can build a bridge.
link |
So they make a copy of themselves
link |
and they connect themselves
link |
and the sort of like self building bridge
link |
based on building blocks.
link |
You can imagine like a building that self assembles.
link |
So it's basically self assembling structures
link |
from robotic parts.
link |
But it's interesting to, within that robot,
link |
add the ability to mutate
link |
and do all the interesting like little things
link |
that you're referring to in evolution
link |
to go from a single origin protein building block
link |
to like this weird complex.
link |
And if you think about this, I mean, you know,
link |
the bits and pieces are there, you know.
link |
So you mentioned the evolution algorithm, right?
link |
You know, so this is sort of,
link |
and maybe sort of the goal is in a way different, right?
link |
So the goal is to, you know, to essentially,
link |
to optimize your search, right?
link |
So, but sort of the ideas are there.
link |
So people recognize that, you know,
link |
that the recombination events lead to global changes
link |
in the search trajectories, the mutations event
link |
is a more refined, you know, step in the search.
link |
Then you have, you know, other sort of
link |
nature inspired algorithm, right?
link |
So one of the reasons that, you know,
link |
I think it's one of the funnest one
link |
is the slime based algorithm, right?
link |
So it's, I think the first was introduced
link |
by the Japanese group,
link |
where it was able to solve some pre complex problems.
link |
So that's, and then I think there are still a lot of things
link |
we've yet to, you know, borrow from the nature, right?
link |
So there are a lot of sort of ideas
link |
that nature, you know, gets to offer us that, you know,
link |
it's up to us to grab it and to, you know,
link |
get the best use of it.
link |
Including neural networks, you know, we have a very crude
link |
inspiration from nature on neural networks.
link |
Maybe there's other inspirations to be discovered
link |
in the brain or other aspects of the various systems,
link |
even like the immune system, the way it interplays.
link |
I recently started to understand that the,
link |
like the immune system has something to do
link |
with the way the brain operates.
link |
Like there's multiple things going on in there,
link |
which all of which are not modeled
link |
in artificial neural networks.
link |
And maybe if you throw a little bit of that biological spice
link |
in there, you'll come up with something, something cool.
link |
I'm not sure if you're familiar with the Drake equation
link |
that estimate, I just did a video on it yesterday
link |
because I wanted to give my own estimate of it.
link |
It's an equation that combines a bunch of factors
link |
to estimate how many alien civilizations are in the galaxy.
link |
I've heard about it, yes.
link |
So one of the interesting parameters, you know,
link |
it's like how many stars are born every year,
link |
how many planets are on average per star for this,
link |
how many habitable planets are there.
link |
And then the one that starts being really interesting
link |
is the probability that life emerges on a habitable planet.
link |
So like, I don't know if you think about,
link |
you certainly think a lot about evolution,
link |
but do you think about the thing
link |
which evolution doesn't describe,
link |
which is like the beginning of evolution, the origin of life.
link |
I think I put the probability of life developing
link |
in a habitable planet at 1%.
link |
This is very scientifically rigorous.
link |
Okay, well, first at a high level for the Drake equation,
link |
what would you put that percent at on earth?
link |
And in general, do you have something,
link |
do you have thoughts about how life might've started,
link |
you know, like the proteins being the first kind of,
link |
one of the early jumping points?
link |
Yeah, so I think back in 2018,
link |
there was a very exciting paper published in Nature
link |
where they found one of the simplest amino acids,
link |
glycine, in a comet dust.
link |
So this is, and I apologize if I don't pronounce,
link |
it's a Russian named comet,
link |
it's I think Chugryumov Gerasimenko.
link |
This is the comet where, and there was this mission
link |
to get close to this comet and get the stardust
link |
And when scientists analyzed it,
link |
they actually found traces of, you know, of glycine,
link |
which, you know, makes up, you know,
link |
it's one of the basic, one of the 20 basic amino acids
link |
that makes up proteins, right?
link |
So that was kind of very exciting, right?
link |
But, you know, the question is very interesting, right?
link |
So what, you know, if there is some alien life,
link |
is it gonna be made of proteins, right?
link |
Or maybe RNAs, right?
link |
So we see that, you know, the RNA viruses are certainly,
link |
you know, very well established sort of, you know,
link |
group of molecular machines, right?
link |
So, yeah, it's a very interesting question.
link |
What probability would you put?
link |
Like, how hard is this job?
link |
Like, how unlikely just on Earth do you think
link |
this whole thing is that we got going?
link |
Like, are we really lucky or is it inevitable?
link |
Like, what's your sense when you sit back
link |
and think about life on Earth?
link |
Is it higher or lower than 1%?
link |
Well, because 1% is pretty low, but it still is like,
link |
damn, that's a pretty good chance.
link |
Yes, it's a pretty good chance.
link |
I mean, I would, personally, but again, you know,
link |
I'm, you know, probably not the best person
link |
to do such estimations, but I would, you know,
link |
intuitively, I would probably put it lower.
link |
But still, I mean, you know, given.
link |
So we're really lucky here on Earth.
link |
Or the conditions are really good.
link |
It's, you know, I think that there was,
link |
everything was right in a way, right?
link |
So we still, it's not, the conditions were not like ideal
link |
if you try to look at, you know, what was, you know,
link |
several billions years ago when the life emerged.
link |
So there is something called the Rare Earth Hypothesis
link |
that, you know, in counter to the Drake Equation says
link |
that the, you know, the conditions of Earth,
link |
if you actually were to describe Earth,
link |
it's quite a special place.
link |
So special it might be unique in our galaxy
link |
and potentially, you know, close to unique
link |
in the entire universe.
link |
Like it's very difficult to reconstruct
link |
those same conditions.
link |
And what the Rare Earth Hypothesis argues
link |
is all those different conditions are essential for life.
link |
And so that's sort of the counter, you know,
link |
like all the things we, you know,
link |
thinking that Earth is pretty average.
link |
I mean, I can't really, I'm trying to remember
link |
to go through all of them, but just the fact
link |
that it is shielded from a lot of asteroids,
link |
the, obviously the distance to the sun,
link |
but also the fact that it's like a perfect balance
link |
between the amount of water and land
link |
and all those kinds of things.
link |
I don't know, there's a bunch of different factors
link |
that I don't remember, there's a long list.
link |
But it's fascinating to think about if in order
link |
for something like proteins and then DNA and RNA
link |
to emerge, you need, and basic living organisms,
link |
you need to be very close to an Earth like planet,
link |
which will be sad or exciting, I don't know which.
link |
If you ask me, I, you know, in a way I put a parallel
link |
between, you know, between our own research.
link |
And I mean, from the intuitive perspective,
link |
you know, you have those two extremes
link |
and the reality is never very rarely falls
link |
into the extremes.
link |
It's always the optimus always reached somewhere in between.
link |
So, and that's what I tend to think.
link |
I think that, you know, we're probably somewhere in between.
link |
So they were not unique, unique, but again,
link |
the chances are, you know, reasonably small.
link |
The problem is we don't know the other extreme
link |
is like, I tend to think that we don't actually understand
link |
the basic mechanisms of like what this is all originated
link |
from, like, it seems like we think of life
link |
as this distinct thing, maybe intelligence
link |
is a distinct thing, maybe the physics that,
link |
from which planets and suns are born is a distinct thing.
link |
But that could be a very, it's like the Stephen Wolfram
link |
thing, it's like the, from simple rules emerges
link |
greater and greater complexity.
link |
So, you know, I tend to believe that just life finds a way.
link |
Like, we don't know the extreme of how common life is
link |
because it could be life is like everywhere.
link |
Like, so everywhere that it's almost like laughable,
link |
like that we're such idiots to think who are you?
link |
Like, it's like ridiculous to even like think,
link |
it's like ants thinking that their little colony
link |
is the unique thing and everything else doesn't exist.
link |
I mean, it's also very possible that that's the extreme
link |
and we're just not able to maybe comprehend
link |
the nature of that life.
link |
Just to stick on alien life for just a brief moment more,
link |
there is some signs of life on Venus in gaseous form.
link |
There's hope for life on Mars, probably extinct.
link |
We're not talking about intelligent life.
link |
Although that has been in the news recently.
link |
We're talking about basic like, you know, bacteria.
link |
Yeah, and then also, I guess, there's a couple moons.
link |
Yeah, Europa, which is Jupiter's moon.
link |
I think there's another one.
link |
Are you, is that exciting or is it terrifying to you
link |
that we might find life?
link |
Do you hope we find life?
link |
I certainly do hope that we find life.
link |
I mean, it was very exciting to hear about this news
link |
about the possible life on Venus.
link |
It'd be nice to have hard evidence of something with,
link |
which is what the hope is for Mars and Europa.
link |
But do you think those organisms
link |
will be similar biologically
link |
or would they even be sort of carbon based
link |
if we do find them?
link |
I would say they would be carbon based.
link |
How similar, it's a big question, right?
link |
So it's the moment we discover things outside Earth, right?
link |
Even if it's a tiny little single cell.
link |
I mean, there is so much.
link |
Just imagine that, that would be so.
link |
I think that that would be another turning point
link |
for the science, you know?
link |
Especially if it's different in some very new way.
link |
Because that says, that's a definitive statement,
link |
not a definitive, but a pretty strong statement
link |
that life is everywhere in the universe.
link |
To me at least, that's really exciting.
link |
You brought up Joshua Lederberg in an offline conversation.
link |
I think I'd love to talk to you about Alpha Fold
link |
and this might be an interesting way
link |
to enter that conversation because,
link |
so he won the 1958 Nobel Prize in Physiology and Medicine
link |
for discovering that bacteria can mate and exchange genes.
link |
But he also did a ton of other stuff,
link |
like we mentioned, helping NASA find life on Mars
link |
The chemical expert system.
link |
Expert systems, remember those?
link |
What do you find interesting about this guy
link |
and his ideas about artificial intelligence in general?
link |
So I have a kind of personal story to share.
link |
So I started my PhD in Canada back in 2000.
link |
And so essentially my PhD was,
link |
so we were developing sort of a new language
link |
for symbolic machine learning.
link |
So it's different from the feature based machine learning.
link |
And one of the sort of cleanest applications
link |
of this approach, of this formalism
link |
was to cheminformatics and computer aided drug design.
link |
So essentially we were, as a part of my research,
link |
I developed a system that essentially looked
link |
at chemical compounds of say the same therapeutic category,
link |
you know, male hormones, right?
link |
And try to figure out the structural fragments
link |
that are the structural building blocks
link |
that are important that define this class
link |
versus structural building blocks
link |
that are there just because, you know,
link |
to complete the structure.
link |
But they are not essentially the ones
link |
that make up the chemical, the key chemical properties
link |
of this therapeutic category.
link |
And, you know, for me, it was something new.
link |
I was trained as an applied mathematicians, you know,
link |
as with some machine learning background,
link |
but, you know, computer aided drug design
link |
was a completely new territory.
link |
So because of that, I often find myself
link |
asking lots of questions on one of these
link |
sort of central forums.
link |
Back then, there were no Facebooks or stuff like that.
link |
There was a forum, you know, it's a forum.
link |
It's essentially, it's like a bulletin board.
link |
Yeah, so you essentially, you have a bunch of people
link |
and you post a question and you get, you know,
link |
an answer from, you know, different people.
link |
And back then, just like one of the most popular forums
link |
was CCL, I think Computational Chemistry Library,
link |
not library, but something like that,
link |
but CCL, that was the forum.
link |
And there, I, you know, I...
link |
Asked a lot of dumb questions.
link |
Yes, I asked questions.
link |
Also shared some, you know, some information
link |
about how formal it is and how we do
link |
and whether whatever we do makes sense.
link |
And so, you know, and I remember that one of these posts,
link |
I mean, I still remember, you know,
link |
I would call it desperately looking
link |
for a chemist advice, something like that, right?
link |
And so I post my question, I explained, you know,
link |
how formalism is, what it does
link |
and what kind of applications I'm planning to do.
link |
And, you know, and it was, you know,
link |
in the middle of the night and I went back to bed.
link |
And next morning, have a phone call from my advisor
link |
who also looked at this forum.
link |
It's like, you won't believe who replied to you.
link |
And it's like, who?
link |
And he said, well, you know, there is a message
link |
to you from Joshua Lederberg.
link |
And my reaction was like, who is Joshua Lederberg?
link |
Your advisor hung up. So, and essentially, you know,
link |
Joshua wrote me that we had conceptually similar ideas
link |
in the dendrial project.
link |
You may wanna look it up.
link |
And we should also, sorry, and it's a side comment,
link |
say that even though he won the Nobel Prize
link |
at a really young age, in 58, but so he was,
link |
I think he was what, 33.
link |
So anyway, so that's, so hence in the 90s,
link |
responding to young whippersnappers on the CCL forum.
link |
And so back then he was already very senior.
link |
I mean, he unfortunately passed away back in 2008,
link |
but, you know, back in 2001, he was, I mean,
link |
he was a professor emeritus at Rockefeller University.
link |
And, you know, that was actually, believe it or not,
link |
one of the reasons I decided to join, you know,
link |
as a postdoc, the group of Andre Salle,
link |
who was at Rockefeller University,
link |
with the hope that, you know, that I could actually,
link |
you know, have a chance to meet Joshua in person.
link |
And I met him very briefly, right?
link |
Just because he was walking, you know,
link |
there's a little bridge that connects the,
link |
sort of the research campus with the,
link |
with the sort of skyscraper that Rockefeller owns,
link |
the where, you know, postdocs and faculty
link |
and graduate students live.
link |
And so I met him, you know,
link |
and had a very short conversation, you know.
link |
But so I started, you know, reading about Dendral
link |
and I was amazed, you know, it's,
link |
we're talking about 1960, right?
link |
The ideas were so profound.
link |
Well, what's the fun about the ideas of it?
link |
The reason to make this is even crazier.
link |
So, Lederberg wanted to make a system
link |
that would help him study the extraterrestrial molecules,
link |
So, the idea was that, you know,
link |
the way you study the extraterrestrial molecules
link |
is you do the mass spec analysis, right?
link |
And so the mass spec gives you sort of bits,
link |
numbers about essentially gives you the ideas
link |
about the possible fragments or, you know,
link |
atoms, you know, and maybe a little fragments,
link |
pieces of this molecule that make up the molecule, right?
link |
So now you need to sort of,
link |
to decompose this information
link |
and to figure out what was the hole
link |
before it became fragments, bits and pieces, right?
link |
So, in order to make this, you know,
link |
to have this tool, the idea of Lederberg
link |
was to connect chemistry, computer science,
link |
and to design this so called expert system
link |
that looks, that takes into account,
link |
that takes as an input the mass spec data,
link |
the possible database of possible molecules
link |
and essentially try to sort of induce the molecule
link |
that would correspond to this spectra
link |
or, you know, essentially what this project ended up being
link |
was that, you know, it would provide a list of candidates
link |
that then a chemist would look at and make final decision.
link |
But the original idea, I suppose,
link |
is to solve the entirety of this problem automatically.
link |
So he, you know, so he,
link |
back then he approached. 60s.
link |
Yes, believe that, it's amazing.
link |
I mean, it still blows my mind, you know, that it's,
link |
that's, and this was essentially the origin
link |
of the modern bioinformatics, cheminformatics,
link |
you know, back in 60s.
link |
So that's, you know, every time you deal with projects
link |
like this, with the, you know, research like this,
link |
you just, you know, so the power of the, you know,
link |
intelligence of this people is just, you know, overwhelming.
link |
Do you think about expert systems, is there,
link |
and why they kind of didn't become successful,
link |
especially in the space of bioinformatics,
link |
where it does seem like there is a lot of expertise
link |
in humans, and, you know, it's possible to see
link |
that a system like this could be made very useful.
link |
So it's actually, it's a great question,
link |
and this is something, so, you know, so, you know,
link |
at my university, I teach artificial intelligence,
link |
and, you know, we start, my first two lectures
link |
are on the history of AI.
link |
And there we, you know, we try to, you know,
link |
go through the main stages of AI.
link |
And so, you know, the question of why expert systems failed
link |
or became obsolete, it's actually a very interesting one.
link |
And there are, you know, if you try to read the, you know,
link |
the historical perspectives,
link |
there are actually two lines of thoughts.
link |
One is that they were essentially
link |
not up to the expectations.
link |
And so therefore they were replaced, you know,
link |
by other things, right?
link |
The other one was that completely opposite one,
link |
that they were too good.
link |
And as a result, they essentially became
link |
sort of a household name,
link |
and then essentially they got transformed.
link |
I mean, in both cases, sort of the outcome was the same.
link |
They evolved into something, right?
link |
And that's what I, you know, if I look at this, right?
link |
So the modern machine learning, right?
link |
So there's echoes in the modern machine learning.
link |
I think so, I think so, because, you know,
link |
if you think about this, you know, and how we design,
link |
you know, the most successful algorithms,
link |
including AlphaFold, right?
link |
You built in the knowledge about the domain
link |
that you study, right?
link |
So you built in your expertise.
link |
So speaking of AlphaFold,
link |
so DeepMind's AlphaFold 2 recently was announced
link |
to have, quote unquote, solved protein folding.
link |
But how exciting is this to you?
link |
It seems to be one of the,
link |
one of the exciting things that have happened in 2020.
link |
It's an incredible accomplishment from the looks of it.
link |
What part of it is amazing to you?
link |
What part would you say is over hype
link |
or maybe misunderstood?
link |
It's definitely a very exciting achievement.
link |
To give you a little bit of perspective, right?
link |
So in bioinformatics, we have several competitions.
link |
And so the way, you know, you often hear
link |
how those competitions have been explained
link |
to sort of to non bioinformaticians is that, you know,
link |
they call it bioinformatics Olympic games.
link |
And there are several disciplines, right?
link |
So the historically one of the first one
link |
was the discipline in predicting the protein structure,
link |
predicting the 3D coordinates of the protein.
link |
But there are some others.
link |
So the predicting protein functions,
link |
predicting effects of mutations on protein functions,
link |
then predicting protein, protein interactions.
link |
So the original one was CASP
link |
or a critical assessment of a protein structure.
link |
And the, you know, typically what happens
link |
during this competitions is, you know, scientists,
link |
experimental scientists solve the structures,
link |
but don't put them into the protein data bank,
link |
which is the centralized database
link |
that contains all the 3D coordinates.
link |
Instead, they hold it and release protein sequences.
link |
And now the challenge of the community
link |
is to predict the 3D structures of this proteins
link |
and then use the experimental results structures
link |
to assess which one is the closest one, right?
link |
And this competition, by the way,
link |
just a bunch of different tangents.
link |
And maybe you can also say, what is protein folding?
link |
Then this competition, CASP competition
link |
has become the gold standard.
link |
And that's what was used to say
link |
that protein folding was solved.
link |
So just to add a little, just a bunch.
link |
So if you could, whenever you say stuff,
link |
maybe throw in some of the basics
link |
for the folks that might be outside of the field.
link |
So, yeah, so, you know, so the reason it's, you know,
link |
it's relevant to our understanding of protein folding
link |
is because, you know, we've yet to learn
link |
how the folding mechanistically works, right?
link |
So there are different hypothesis,
link |
what happens to this fold?
link |
For example, there is a hypothesis that the folding happens
link |
by, you know, also in the modular fashion, right?
link |
So that, you know, we have protein domains
link |
that get folded independently
link |
because their structure is stable.
link |
And then the whole protein structure gets formed.
link |
But, you know, within those domains,
link |
we also have a so called secondary structure,
link |
the small alpha helices, beta schists.
link |
So these are, you know, elements that are structurally stable.
link |
And so, and the question is, you know,
link |
when do they get formed?
link |
Because some of the secondary structure elements,
link |
you have to have, you know, a fragment in the beginning
link |
and say the fragment in the middle, right?
link |
So you cannot potentially start having the full fold
link |
from the get go, right?
link |
So it's still, you know, it's still a big enigma,
link |
We know that it's an extremely efficient
link |
and stable process, right?
link |
So there's this long sequence
link |
and the fold happens really quickly.
link |
So that's really weird, right?
link |
And it happens like the same way almost every time.
link |
That's really weird.
link |
That's freaking weird.
link |
It's, yeah, that's why it's such an amazing thing.
link |
But most importantly, right?
link |
So it's, you know, so when you see the, you know,
link |
the translation process, right?
link |
So when you don't have the whole protein translated,
link |
right, it's still being translated,
link |
you know, getting out from the ribosome,
link |
you already see some structural, you know, fragmentation.
link |
So folding starts happening
link |
before the whole protein gets produced, right?
link |
And so this is obviously, you know,
link |
one of the biggest questions in, you know,
link |
in modern molecular biologists.
link |
Not like maybe what happens,
link |
like that's not as bigger than the question of folding.
link |
That's the question of like,
link |
something like deeper fundamental idea of folding.
link |
Yes. Behind folding.
link |
So, you know, so obviously if we are able to predict
link |
the end product of protein folding,
link |
we are one step closer to understanding
link |
sort of the mechanisms of the protein folding.
link |
Because we can then potentially look and start probing
link |
what are the critical parts of this process
link |
and what are not so critical parts of this process.
link |
So we can start decomposing this, you know,
link |
so in a way this protein structure prediction algorithm
link |
can be used as a tool, right?
link |
So you change the, you know, you modify the protein,
link |
you get back to this tool, it predicts,
link |
okay, it's completely unstable.
link |
Yeah, which aspects of the input
link |
will have a big impact on the output?
link |
So what happens is, you know,
link |
we typically have some sort of incremental advancement,
link |
you know, each stage of this CASP competition,
link |
you have groups with incremental advancement
link |
and, you know, historically the top performing groups
link |
were, you know, they were not using machine learning.
link |
They were using a very advanced biophysics
link |
combined with bioinformatics,
link |
combined with, you know, the data mining
link |
and that was, you know, that would enable them
link |
to obtain protein structures of those proteins
link |
that don't have any structurally solved relatives
link |
because, you know, if we have another protein,
link |
say the same protein, but coming from a different species,
link |
we could potentially derive some ideas
link |
and that's so called homology or comparative modeling,
link |
where we'll derive some ideas
link |
from the previously known structures
link |
and that would help us tremendously
link |
in, you know, in reconstructing the 3D structure overall.
link |
But what happens when we don't have these relatives?
link |
This is when it becomes really, really hard, right?
link |
So that's so called de novo, you know,
link |
de novo protein structure prediction.
link |
And in this case, those methods were traditionally very good.
link |
But what happened in the last year,
link |
the original alpha fold came into
link |
and all of a sudden it's much better than everyone else.
link |
Oh, and the competition is only every two years, I think.
link |
And then, so, you know, it was sort of kind of over shockwave
link |
to the bioinformatics community that, you know,
link |
we have like a state of the art machine learning system
link |
that does, you know, structure prediction.
link |
And essentially what it does, you know,
link |
so if you look at this, it actually predicts the context.
link |
So, you know, so the process of reconstructing
link |
the 3D structure starts by predicting the context
link |
between the different parts of the protein.
link |
And the context essentially is the parts of the proteins
link |
that are in a close proximity to each other.
link |
Right, so actually the machine learning part
link |
seems to be estimating, you can correct me if I'm wrong here,
link |
but it seems to be estimating the distance matrix,
link |
which is like the distance between the different parts.
link |
Yeah, so we call the contact map.
link |
So once you have the contact map,
link |
the reconstruction is becoming more straightforward, right?
link |
But so the contact map is the key.
link |
And so, you know, so that what happened.
link |
And now we started seeing in this current stage, right?
link |
Well, in the most recent one,
link |
we started seeing the emergence of these ideas
link |
in other people works, right?
link |
But yet here's, you know, AlphaFold2
link |
that again outperforms everyone else.
link |
And also by introducing yet another wave
link |
of the machine learning ideas.
link |
Yeah, there don't seem to be also an incorporation.
link |
First of all, the paper is not out yet,
link |
but there's a bunch of ideas already out.
link |
There does seem to be an incorporation of this other thing.
link |
I don't know if it's something that you could speak to,
link |
which is like the incorporation of like other structures,
link |
like evolutionary similar structures
link |
that are used to kind of give you hints.
link |
Yes, so evolutionary similarity is something
link |
that we can detect at different levels, right?
link |
So we know, for example,
link |
that the structure of proteins
link |
is more conserved than the sequence.
link |
The sequence could be very different,
link |
but the structural shape is actually still very conserved.
link |
So that's sort of the intrinsic property that, you know,
link |
in a way related to protein folds,
link |
you know, to the evolution of the, you know,
link |
of the proteins and protein domains, et cetera.
link |
But we know that, I mean, there've been multiple studies.
link |
And, you know, ideally, if you have structures,
link |
you know, you should use that information.
link |
However, sometimes we don't have this information.
link |
Instead, we have a bunch of sequences.
link |
Sequences, we have a lot, right?
link |
So we have, you know, hundreds, thousands
link |
of, you know, different organisms sequenced, right?
link |
And by taking the same protein,
link |
but in different organisms and aligning it,
link |
so making it, you know, making the corresponding positions
link |
aligned, we can actually say a lot
link |
about sort of what is conserved in this protein
link |
and therefore, you know, structurally more stable,
link |
what is diverse in this protein.
link |
So on top of that, we could provide sort of the information
link |
about the sort of the secondary structure
link |
of this protein, et cetera, et cetera.
link |
So this information is extremely useful
link |
and it's already there.
link |
So while it's tempting to, you know,
link |
to do a complete ab initio,
link |
so you just have a protein sequence and nothing else,
link |
the reality is such that we are overwhelmed with this data.
link |
So why not use it?
link |
And so, yeah, so I'm looking forward
link |
to reading this paper.
link |
It does seem to, like they've,
link |
in the previous version of Alpha Fold,
link |
they didn't, for this evolutionary similarity thing,
link |
they didn't use machine learning for that.
link |
Or rather, they used it as like the input
link |
to the entirety of the neural net,
link |
like the features derived from the similarity.
link |
It seems like there's some kind of quote, unquote,
link |
iterative thing where it seems to be part of the learning
link |
process is the incorporation of this evolutionary similarity.
link |
Yeah, I don't think there is a bioarchive paper, right?
link |
No, there's nothing.
link |
There's a blog post that's written
link |
by a marketing team, essentially,
link |
which, you know, it has some scientific similarity,
link |
probably, to the actual methodology used,
link |
but it could be, it's like interpreting scripture.
link |
It could be just poetic interpretations of the actual work
link |
as opposed to direct connection to the work.
link |
So now, speaking about protein folding, right?
link |
So, you know, in order to answer the question
link |
whether or not we have solved this, right?
link |
So we need to go back to the beginning of our conversation
link |
with the realization that an average protein
link |
is that typically what the CASP has been focusing on
link |
is this competition has been focusing
link |
on the single, maybe two domain proteins
link |
that are still very compact.
link |
And even those ones are extremely challenging to solve.
link |
But now we talk about, you know,
link |
an average protein that has two, three protein domains.
link |
If you look at the proteins that are in charge
link |
of the, you know, of the process with the neural system,
link |
right, perhaps one of the most recently evolved
link |
sort of systems in an organism, right?
link |
All of them, well, the majority of them
link |
are highly multi domain proteins.
link |
So they are, you know, some of them have five, six, seven,
link |
you know, and more domains, right?
link |
And, you know, we are very far away
link |
from understanding how these proteins are folded.
link |
So the complexity of the protein matters here.
link |
The complexity of the protein modules
link |
or the protein domains.
link |
So you're saying solved, so the definition
link |
of solved here is particularly the CASP competition
link |
achieving human level, not human level,
link |
achieving experimental level performance
link |
on these particular sets of proteins
link |
that have been used in these competitions.
link |
Well, I mean, you know, I do think that, you know,
link |
especially with regards to the alpha fold,
link |
you know, it is able to, you know, to solve,
link |
you know, at the near experimental level,
link |
pre big majority of the more compact proteins
link |
like, or protein domains.
link |
Because again, in order to understand
link |
how the overall protein, you know,
link |
multi domain protein fold, we do need to understand
link |
the structure of its individual domains.
link |
I mean, unlike if you look at alpha zero
link |
or like even mu zero, if you look at that work,
link |
you know, it's nice reinforcement learning
link |
self playing mechanisms are nice
link |
cause it's all in simulation.
link |
So you can learn from just huge amounts.
link |
Like you don't need data.
link |
It was like the problem with proteins,
link |
like the size, I forget how many 3D structures
link |
have been mapped, but the training data is very small.
link |
No matter what, it's like millions,
link |
maybe a one or two million or something like that,
link |
but it's some very small number,
link |
but like, it doesn't seem like that's scalable.
link |
There has to be, I don't know,
link |
it feels like you want to somehow 10 X the data
link |
or a hundred X the data somehow.
link |
Yes, but we also can take advantage of homology models,
link |
right, so the models that are of very good quality
link |
because they are essentially obtained
link |
based on the evolutionary information, right?
link |
So you can, there is a potential to enhance this information
link |
and, you know, use it again to empower the training set.
link |
And it's, I think, I am actually very optimistic.
link |
I think it's been one of this sort of, you know,
link |
churning events where you have a system that is,
link |
you know, a machine learning system
link |
that is truly better than the machine learning system.
link |
Better than the sort of the more conventional
link |
biophysics based methods.
link |
That's a huge leap.
link |
This is one of those fun questions,
link |
but where would you put it in the ranking
link |
of the greatest breakthroughs
link |
in artificial intelligence history?
link |
So like, okay, so let's see who's in the running.
link |
Maybe you can correct me.
link |
So you got like AlphaZero and AlphaGo
link |
beating the world champion at the game of Go.
link |
Thought to be impossible like 20 years ago.
link |
Or at least the AI community was highly skeptical.
link |
Then you got like also Deep Blue original Kasparov.
link |
You have deep learning itself,
link |
like the maybe, what would you say,
link |
the AlexNet, ImageNet moment.
link |
So the first neural network
link |
achieving human level performance.
link |
Super, that's not true.
link |
Achieving like a big leap in performance
link |
on the computer vision problem.
link |
There is OpenAI, the whole like GPT3,
link |
that whole space of transformers and language models
link |
just achieving this incredible performance
link |
of application of neural networks to language models.
link |
Boston Dynamics, pretty cool.
link |
People are like, there's no AI.
link |
No, no, there's no machine learning currently.
link |
But AI is much bigger than machine learning.
link |
So that just the engineering aspect,
link |
I would say it's one of the greatest accomplishments
link |
in engineering side.
link |
Engineering meaning like mechanical engineering
link |
Then of course, autonomous vehicles.
link |
You can argue for Waymo,
link |
which is like the Google self driving car.
link |
Or you can argue for Tesla,
link |
which is like actually being used
link |
by hundreds of thousands of people on the road today,
link |
machine learning system.
link |
And I don't know if you can, what else is there?
link |
But I think that's it.
link |
And then AlphaFold, many people are saying
link |
is up there, potentially number one.
link |
Would you put them at number one?
link |
Well, in terms of the impact on the science
link |
and on the society beyond, it's definitely,
link |
to me would be one of the...
link |
Maybe, I mean, I'm probably not the best person
link |
But I do have, I remember my,
link |
back in, I think 1997, when Deep Blue,
link |
that Kasparov, it was, I mean, it was a shock.
link |
I mean, it was, and I think for the,
link |
for the pre substantial part of the world,
link |
that especially people who have some experience with chess,
link |
and realizing how incredibly human this game,
link |
how much of a brain power you need
link |
to reach those levels of grandmasters, right, level.
link |
And it's probably one of the first time,
link |
and how good Kasparov was.
link |
And again, yeah, so Kasparov's arguably
link |
one of the best ever, right?
link |
And you get a machine that beats him.
link |
All right, so it's...
link |
First time a machine probably beat a human
link |
at that scale of a thing, of anything.
link |
So that was, to me, that was like, you know,
link |
one of the groundbreaking events in the history of AI.
link |
Yeah, that's probably number one.
link |
Probably, like we don't, it's hard to remember.
link |
It's like Muhammad Ali versus, I don't know,
link |
any of the Mike Tyson, something like that.
link |
It's like, nah, you gotta put Muhammad Ali at number one.
link |
Same with Deep Blue,
link |
even though it's not machine learning based.
link |
Still, it uses advanced search,
link |
and search is the integral part of AI, right?
link |
It's not, people don't think of it that way at this moment.
link |
In vogue currently, search is not seen
link |
as a fundamental aspect of intelligence,
link |
but it very well, I mean, it very likely is.
link |
In fact, I mean, that's what neural networks are,
link |
is they're just performing search
link |
on the space of parameters, and it's all search.
link |
All of intelligence is some form of search,
link |
and you just have to become cleverer and clever
link |
at that search problem.
link |
And I also have another one that you didn't mention
link |
that's one of my favorite ones is,
link |
so you've probably heard of this,
link |
it's, I think it's called Deep Rembrandt.
link |
It's the project where they trained,
link |
I think there was a collaboration
link |
between the sort of the experts
link |
in Rembrandt painting in Netherlands,
link |
and a group, an artificial intelligence group,
link |
where they train an algorithm
link |
to replicate the style of the Rembrandt,
link |
and they actually printed a portrait
link |
that never existed before in the style of Rembrandt.
link |
I think they printed it on a sort of,
link |
on the canvas that, you know,
link |
using pretty much same types of paints and stuff.
link |
To me, it was mind blowing.
link |
Yeah, and the space of art, that's interesting.
link |
There hasn't been, maybe that's it,
link |
but I think there hasn't been an image in that moment yet
link |
in the space of art.
link |
You haven't been able to achieve
link |
superhuman level performance in the space of art,
link |
even though there's this big famous thing
link |
where a piece of art was purchased,
link |
I guess for a lot of money.
link |
Yeah, but it's still, you know,
link |
people are like in the space of music at least,
link |
that's, you know, it's clear that human created pieces
link |
are much more popular.
link |
So there hasn't been a moment where it's like,
link |
oh, this is, we're now,
link |
I would say in the space of music,
link |
what makes a lot of money,
link |
we're talking about serious money,
link |
it's music and movies, or like shows and so on,
link |
and entertainment.
link |
There hasn't been a moment where AI created,
link |
AI was able to create a piece of music
link |
or a piece of cinema, like Netflix show,
link |
that is, you know, that's sufficiently popular
link |
to make a ton of money.
link |
And that moment would be very, very powerful,
link |
because that's like, that's an AI system
link |
being used to make a lot of money.
link |
And like direct, of course, AI tools,
link |
like even Premiere, audio editing,
link |
all the editing, everything I do,
link |
to edit this podcast, there's a lot of AI involved.
link |
Actually, this is a program,
link |
I wanna talk to those folks, just cause I wanna nerd out,
link |
it's called iZotope, I don't know if you're familiar with it.
link |
They have a bunch of tools of audio processing,
link |
and they have, I think they're Boston based,
link |
just, it's so exciting to me to use it,
link |
like on the audio here,
link |
cause it's all machine learning.
link |
It's not, cause most audio production stuff
link |
is like any kind of processing you do,
link |
it's very basic signal processing,
link |
and you're tuning knobs and so on.
link |
They have all of that, of course,
link |
but they also have all of this machine learning stuff,
link |
like where you actually give it training data,
link |
you select parts of the audio you train on,
link |
you train on it, and it figures stuff out.
link |
It's great, it's able to detect,
link |
like the ability of it to be able
link |
to separate voice and music, for example,
link |
or voice and anything, is incredible.
link |
Like it just, it's clearly exceptionally good
link |
at applying these different neural networks models
link |
to just separate the different kinds
link |
of signals from the audio.
link |
That, okay, so that's really exciting.
link |
Photoshop, Adobe people also use it,
link |
but to generate a piece of music
link |
that will sell millions, a piece of art, yeah.
link |
No, I agree, and you know, it's,
link |
that's, you know, as I mentioned,
link |
I offer my AI class, and you know,
link |
an integral part of this is the project, right?
link |
So it's my favorite, ultimate favorite part,
link |
because it typically, we have these project presentations
link |
the last two weeks of the classes,
link |
right before, you know, the Christmas break,
link |
and it's sort of, it adds this cool excitement,
link |
and every time, I mean, I'm amazed, you know,
link |
with some projects that people, you know, come up with.
link |
And so, and quite a few of them are actually, you know,
link |
they have some link to arts.
link |
I mean, you know, I think last year we had a group
link |
who designed an AI producing hokus, Japanese poems.
link |
So, and some of them, so, you know,
link |
it got trained on the English based,
link |
haikus, haikus, right?
link |
So, and some of them, you know,
link |
they get to present, like, the top selection.
link |
They were pretty good.
link |
I mean, you know, I mean, of course, I'm not a specialist,
link |
but you read them, and you see this is real.
link |
It seems profound.
link |
Yes, yeah, it seems real.
link |
So it's kind of cool.
link |
We also had a couple of projects where people tried
link |
to teach AI how to play, like, rock music, classical music.
link |
I think, and popular music.
link |
Interestingly enough, you know,
link |
classical music was among the most difficult ones.
link |
And, you know, of course, if you, if, you know,
link |
you know, if you look at the, you know,
link |
the, like, grandmasters of music, like Bach, right?
link |
So there is a lot of, there is a lot of,
link |
there is a lot of almost math.
link |
Yeah, well, he's very mathematical.
link |
So this is, I would imagine that at least some style
link |
of this music could be picked up,
link |
but then you have this completely different spectrum
link |
of classical composers.
link |
And so, you know, it's almost like, you know,
link |
you don't have to sort of look at the data.
link |
You just listen to it and say, nah, that's not it, not yet.
link |
That's not it, yeah.
link |
That's how I feel too.
link |
There's OpenAI has, I think, OpenMuse
link |
or something like that, the system.
link |
It's cool, but it's like, eh,
link |
it's not compelling for some reason.
link |
It could be a psychological reason too.
link |
Maybe we need to have a human being,
link |
a tortured soul behind the music.
link |
Yeah, no, absolutely.
link |
I completely agree.
link |
But yeah, whether or not we'll have,
link |
one day we'll have, you know,
link |
a song written by an AI engine
link |
to be like in top charts, musical charts,
link |
I wouldn't be surprised.
link |
I wouldn't be surprised.
link |
I wonder if we already have one
link |
and it just hasn't been announced.
link |
How hard is the multi protein folding problem?
link |
Is that kind of something you've already mentioned
link |
which is baked into this idea of greater
link |
and greater complexity of proteins?
link |
Like multi domain proteins,
link |
is that basically become multi protein complexes?
link |
Yes, you got it right.
link |
So it's sort of, it has the components
link |
of both of protein folding
link |
and protein, protein interactions.
link |
Because in order for these domains,
link |
many of these proteins actually,
link |
they never form a stable structure.
link |
One of my favorite proteins,
link |
and pretty much everyone who works in the,
link |
I know, whom I know, who works with proteins,
link |
they always have their favorite proteins.
link |
Right, so one of my favorite proteins,
link |
probably my favorite protein,
link |
the one that I worked when I was a postdoc
link |
is so called post synaptic density 95, PSD 95 protein.
link |
So it's one of the key actors
link |
in the majority of neurological processes
link |
at the molecular level.
link |
So it's a, and essentially it's a key player
link |
in the post synaptic density.
link |
So this is the crucial part of this synapse
link |
where a lot of these chemical processes are happening.
link |
So it has five domains, right?
link |
So five protein domains.
link |
So pretty large proteins, I think 600 something assets.
link |
But the way it's organized itself, it's flexible, right?
link |
So it acts as a scaffold.
link |
So it is used to bring in other proteins.
link |
So they start acting in the orchestrated manner, right?
link |
So, and the type of the shape of this protein,
link |
it's in a way, there are some stable parts of this protein,
link |
but there are some flexible.
link |
And this flexibility is built in into the protein
link |
in order to become sort of this multifunctional machine.
link |
So do you think that kind of thing is also learnable
link |
through the alpha fold two kind of approach?
link |
I mean, the time will tell.
link |
Is it another level of complexity?
link |
Is it like how big of a jump in complexity
link |
is that whole thing?
link |
To me, it's yet another level of complexity
link |
because when we talk about protein, protein interactions,
link |
and there is actually a different challenge for this
link |
called Capri, and so this, that is focused specifically
link |
on macromolecular interactions, protein, protein, protein,
link |
So, but it's, there are different mechanisms
link |
that govern molecular interactions
link |
and that need to be picked up,
link |
say by a machine learning algorithm.
link |
Interestingly enough, we actually,
link |
we participated for a few years in this competition.
link |
We typically don't participate in competitions,
link |
I don't know, don't have enough time,
link |
because it's very intensive, it's a very intensive process.
link |
But we participated back in about 10 years ago or so.
link |
And the way we entered this competition,
link |
so we design a scoring function, right?
link |
So the function that evaluates
link |
whether or not your protein, protein interaction
link |
is supposed to look like experimentally solved, right?
link |
So the scoring function is very critical part
link |
of the model prediction.
link |
So we designed it to be a machine learning one.
link |
And so it was one of the first machine learning
link |
based scoring function used in Capri.
link |
And we essentially learned what should contribute,
link |
what are the critical components contributing
link |
into the protein, protein interaction.
link |
So this could be converted into a learning problem
link |
and thereby it could be learned?
link |
I believe so, yes.
link |
Do you think AlphaFold2 or something similar to it
link |
from DeepMind or somebody else will be,
link |
will result in a Nobel Prize or multiple Nobel Prizes?
link |
So like, you know, obviously, maybe not so obviously,
link |
you can't give a Nobel Prize to a computer program.
link |
At least for now, give it to the designers of that program.
link |
But do you see one or multiple Nobel Prizes
link |
where AlphaFold2 is like a large percentage
link |
of what that prize is given for?
link |
Would it lead to discoveries at the level of Nobel Prizes?
link |
I mean, I think we are definitely destined
link |
to see the Nobel Prize becoming sort of,
link |
to be evolving with the evolution of science
link |
and the evolution of science as such
link |
that it now becomes like really multi facets, right?
link |
So where you don't really have like a unique discipline,
link |
you have sort of the, a lot of cross disciplinary talks
link |
in order to achieve sort of, you know,
link |
really big advancements, you know.
link |
So I think, you know, the computational methods
link |
will be acknowledged in one way or another.
link |
And as a matter of fact, you know,
link |
they were first acknowledged back in 2013, right?
link |
Where, you know, the first three people were, you know,
link |
awarded the Nobel Prize for study the protein folding,
link |
right, the principle.
link |
And, you know, I think all three of them
link |
are computational biophysicists, right?
link |
So, you know, that I think is unavoidable.
link |
You know, it will come with the time.
link |
The fact that, you know, alpha fold and, you know,
link |
similar approaches, because again, it's a matter of time
link |
that people will embrace this, you know, principle
link |
and we'll see more and more such, you know,
link |
such tools coming into play.
link |
But, you know, these methods will be critical
link |
in a scientific discovery, no doubts about it.
link |
On the engineering side, maybe a dark question,
link |
but do you think it's possible to use
link |
these machine learning methods
link |
to start to engineer proteins?
link |
And the next question is something quite a few biologists
link |
are against, some are for, for study purposes,
link |
is to engineer viruses.
link |
Do you think machine learning, like something like alpha fold
link |
could be used to engineer viruses?
link |
So to answer the first question, you know,
link |
it has been, you know, a part of the research
link |
in the protein science, the protein design is, you know,
link |
is a very prominent areas of research.
link |
Of course, you know, one of the pioneers is David Baker
link |
and Rosetta algorithm that, you know,
link |
essentially was doing the de novo design and was used
link |
to design new proteins, you know.
link |
And design of proteins means design of function.
link |
So like when you design a protein, you can control,
link |
I mean, the whole point of a protein
link |
with the protein structure comes a function,
link |
like it's doing something.
link |
So you can design different things.
link |
So you can, yeah, so you can, well,
link |
you can look at the proteins from the functional perspective.
link |
You can also look at the proteins
link |
from the structural perspective, right?
link |
So the structural building blocks.
link |
So if you want to have a building block
link |
of a certain shape, you can try to achieve it
link |
by, you know, introducing a new protein sequence
link |
and predicting, you know, how it will fold.
link |
So with that, I mean, it's a natural,
link |
one of the, you know, natural applications
link |
of these algorithms.
link |
Now, talking about engineering a virus.
link |
With machine learning.
link |
With machine learning, right?
link |
So, well, you know, so luckily for us,
link |
I mean, we don't have that much data, right?
link |
We actually, right now, one of the projects
link |
that we are carrying on in the lab
link |
is we're trying to develop a machine learning algorithm
link |
that determines the,
link |
whether or not the current strain is pathogenic.
link |
And the current strain of the coronavirus.
link |
I mean, so there are applications to coronaviruses
link |
because we have strains of SARS COVID 2,
link |
also SARS COVID, MERS that are pathogenic,
link |
but we also have strains of other coronaviruses
link |
that are, you know, not pathogenic.
link |
I mean, the common cold viruses and, you know,
link |
some other ones, right?
link |
So, so pathogenic meaning spreading.
link |
Pathogenic means actually inflicting damage.
link |
There are also some, you know,
link |
seasonal versus pandemic strains of influenza, right?
link |
And determining the, what are the molecular determinant,
link |
So that are built in, into the protein sequence,
link |
into the gene sequence, right?
link |
So, and whether or not the machine learning
link |
can determine those, those components, right?
link |
So like using machine learning to do,
link |
that's really interesting to, to, to given,
link |
give the input is like what the entire,
link |
the protein sequence and then determine
link |
if this thing is going to be able to do damage
link |
to a biological system.
link |
It's a good machine learning,
link |
you're saying we don't have enough data for that?
link |
We, I mean, for, for this specific one, we do.
link |
We might actually, I have, you know,
link |
have to back up on this because we're still in the process.
link |
There was one work that appeared in bioarchive
link |
by Eugene Kunin, who is one of these, you know,
link |
pioneers in, in, in evolutionary genomics.
link |
And they tried to look at this, but, you know,
link |
the methods were sort of standard, you know,
link |
supervised learning methods.
link |
And now the question is, you know,
link |
can you advance it further by, by using, you know,
link |
not so standard methods, you know?
link |
So there's obviously a lot of hope in,
link |
in transfer learning where you can actually try to transfer
link |
the information that the machine learning learns about
link |
the proper protein sequences, right?
link |
And, you know, so, so there is some promise
link |
in going this direction, but if we have this,
link |
it would be extremely useful because then
link |
we could essentially forecast the potential mutations
link |
that would make the current strain
link |
more or less pathogenic.
link |
Anticipate, anticipate them from a vaccine development,
link |
for the treatment, antiviral drug development.
link |
That, that would be a very crucial task.
link |
But you could also use that system to then say,
link |
how would we potentially modify this virus
link |
to make it more pathogenic?
link |
This, that's true.
link |
And then, you know, the, again,
link |
the hope is, well, several things, right?
link |
So one is that, you know, it's,
link |
even if you design a, you know, a sequence, right?
link |
So to carry out the actual experimental biology,
link |
to ensure that all the components working, you know,
link |
is a completely different matter.
link |
Difficult process.
link |
Then the, you know, we've seen in the past,
link |
there could be some regulation of the moment
link |
the scientific community recognizes
link |
that it's now becoming no longer a sort of a fun puzzle
link |
to, you know, for machine learning.
link |
Yeah, so then there might be some regulation.
link |
So I think back in, what, 2015, there was, you know,
link |
there was an issue on regulating the research
link |
on influenza strains, right?
link |
There were several groups, you know,
link |
used sort of the mutation analysis
link |
to determine whether or not this strain will jump
link |
from one species to another.
link |
And I think there was like a half a year moratorium
link |
on the research on the paper published
link |
until, you know, scientists, you know, analyzed it
link |
and decided that it's actually safe.
link |
I forgot what that's called.
link |
Something of function, test of function.
link |
Gain of function, yeah.
link |
Gain of function, loss of function, that's right.
link |
It's like, let's watch this thing mutate for a while
link |
to see like, to see what kind of things we can observe.
link |
I guess I'm not so much worried
link |
about that kind of research if there's a lot of regulation
link |
and if it's done very well and with competence and seriously.
link |
I am more worried about kind of this, you know,
link |
the underlying aspect of this question
link |
is more like 50 years from now.
link |
Speaking to the Drake equation,
link |
one of the parameters in the Drake equation
link |
is how long civilizations last.
link |
And that seems to be the most important value actually
link |
for calculating if there's other alien
link |
intelligent civilizations out there.
link |
That's where there's most variability.
link |
Assuming like if life, if that percentage
link |
that life can emerge is like not zero,
link |
like if we're a super unique,
link |
then it's the how long we last
link |
is basically the most important thing.
link |
So from a selfish perspective,
link |
but also from a Drake equation perspective,
link |
I'm worried about our civilization lasting.
link |
And you kind of think about all the ways
link |
in which machine learning can be used
link |
to design greater weapons of destruction, right?
link |
And I mean, one way to ask that
link |
if you look sort of 50 years from now,
link |
a hundred years from now,
link |
would you be more worried about natural pandemics
link |
or engineered pandemics?
link |
Like who's the better designer of viruses,
link |
nature or humans if we look down the line?
link |
I think in my view, I would still be worried
link |
about the natural pandemics simply because I mean,
link |
the capacity of the nature producing this.
link |
It does pretty good job, right?
link |
And the motivation for using virus,
link |
engineering viruses as a weapon is a weird one
link |
because maybe you can correct me on this,
link |
but it seems very difficult to target a virus, right?
link |
The whole point of a weapon, the way a rocket works,
link |
if a starting point, you have an end point
link |
and you're trying to hit a target,
link |
to hit a target with a virus is very difficult.
link |
It's basically just, right?
link |
The target would be the human species.
link |
Yeah, I have a hope in us.
link |
I'm forever optimistic that we will not,
link |
there's insufficient evil in the world
link |
to lead to that kind of destruction.
link |
Well, I also hope that, I mean, that's what we see.
link |
I mean, with the way we are getting connected,
link |
the world is getting connected.
link |
I think it helps for the world to become more transparent.
link |
So the information spread is,
link |
I think it's one of the key things for the society
link |
to become more balanced one way or another.
link |
This is something that people disagree with me on,
link |
but I do think that the kind of secrecy
link |
that governments have.
link |
So you're kind of speaking more to the other aspects,
link |
like a research community being more open,
link |
companies are being more open.
link |
Government is still like,
link |
we're talking about like military secrets.
link |
I think military secrets of the kind
link |
that could destroy the world
link |
will become also a thing of the 20th century.
link |
It'll become more and more open.
link |
I think nations will lose power in the 21st century,
link |
like lose sufficient power towards secrecies.
link |
Transparency is more beneficial than secrecy,
link |
but of course it's not obvious.
link |
Let's hope so that the governments
link |
will become more transparent.
link |
What, so we last talked, I think in March or April,
link |
what have you learned?
link |
How has your philosophical, psychological,
link |
biological worldview changed since then?
link |
Or you've been studying it nonstop
link |
from a computational biology perspective.
link |
How has your understanding and thoughts about this virus
link |
changed over those months from the beginning to today?
link |
One thing that I was really amazed at
link |
how efficient the scientific community was.
link |
I mean, and even just judging on this very narrow domain
link |
of protein structure and understanding
link |
the structural characterization of this virus
link |
from the components point of view,
link |
whole virus point of view.
link |
If you look at SARS, something that happened less than 20,
link |
but close enough, 20 years ago,
link |
and you see what, when it happened,
link |
what was sort of the response by the scientific community,
link |
you see that the structure characterizations did a cure,
link |
but it took several years, right?
link |
Now the things that took several years,
link |
it's a matter of months, right?
link |
So we see that the research pop up.
link |
We are at the unprecedented level
link |
in terms of the sequencing, right?
link |
Never before we had a single virus sequence so many times,
link |
so which allows us to actually to trace very precisely
link |
the sort of the evolutionary nature of this virus,
link |
what happens, and it's not just this virus independently
link |
of everything, it's the sequence of this virus
link |
linked, anchored to the specific geographic place
link |
people, because our genotype influences also
link |
the evolution of this, it's always a host pathogen,
link |
core evolution that, you know,
link |
it's not just the virus, it's the sequence of this virus,
link |
it's the sequence of this virus linked to the specific
link |
geographic place, it's the sequence of this virus
link |
linked to the specific geographic place to specific people,
link |
that, you know, occurs.
link |
It'd be cool if we also had a lot more data about,
link |
so that the spread of this virus, not maybe,
link |
well, it'd be nice if we had it for like contact tracing
link |
purposes for this virus, but it'd be also nice if we had it
link |
for the study for future viruses to be able to respond
link |
and so on, but it's already nice that we have geographical
link |
data and the basic data from individual humans, yeah.
link |
Exactly, no, I think contact tracing is obviously
link |
a key component in understanding
link |
the spread of this virus.
link |
There is also, there is a number of challenges, right?
link |
So XPRIZE is one of them, we
link |
just recently took a part of
link |
this competition, it's the prediction of the
link |
number of infections in different regions.
link |
So, you know, obviously the AI
link |
is the main topic in those predictions.
link |
Yeah, but it's still, the data, I mean, that's a competition,
link |
but the data is weak
link |
on the training. Like, it's great,
link |
it's much more than probably before, but like, it'd be nice if it was like
link |
really rich. I talked to Michael Mina from
link |
Harvard, I mean, he dreams that the community comes together with like a
link |
weather map to where viruses, right, like
link |
really high resolution sensors on like how
link |
from person to person the viruses that travel, all the different kinds of viruses, right?
link |
Because there's a ton of them, and then you'd be able to tell
link |
the story that you've spoken about
link |
of the evolution of these viruses, like day to day mutations that
link |
are occurring. I mean, that'd be fascinating just from a perspective of
link |
study and from the perspective of being able to respond to future pandemics.
link |
That's ultimately what I'm worried about. People love
link |
books. Is there some three
link |
or whatever number of books, technical, fiction, philosophical, that
link |
brought you joy in life, had an impact on your life,
link |
and maybe some that you would recommend others?
link |
I'll give you three very different books, and I also have a special runner up.
link |
Honorable mention.
link |
I mean, it's an audiobook, and that's
link |
some specific reason behind it. So the first book is
link |
something that sort of impacted my earlier
link |
stage of life, and I'm probably not going to be very original here.
link |
It's Bulgakov's Master and Margarita.
link |
For a Russian, maybe it's not super original,
link |
but it's a really powerful book, even in English.
link |
It is incredibly powerful, and
link |
I mean, the way it ends.
link |
I still have goosebumps when I read
link |
the very last sort of, it's called prologue, where
link |
it's just so powerful. What impact did it have on you? What ideas?
link |
What insights did you get from it? I was just taken by
link |
you have those parallel lives
link |
apart from many centuries, and
link |
somehow they got sort of intertwined into
link |
one story, and that
link |
to me was fascinating. And of course
link |
the romantic part of this book is like
link |
it's not just romance, it's like the romance
link |
empowered by sort of magic, right?
link |
And maybe on top of that, you have some irony,
link |
which is unavoidable, right? Because it was that
link |
Soviet time. But it's very deeply Russian, so that's
link |
the wit, the humor, the pain, the love,
link |
all of that is one of the books that kind of captures
link |
something about Russian culture that people outside of Russia
link |
should probably read. I agree. What's the second one? So the second one
link |
is again another one that it happened
link |
I read it later in my life. I think I read it
link |
first time when I was a graduate student.
link |
And that's the Solzhenitsyn's Cancer Word.
link |
That is amazingly powerful book.
link |
What is it about? It's about, I mean, essentially
link |
based on Solzhenitsyn was
link |
diagnosed with cancer when he was reasonably young, and he
link |
made a full recovery. So this is
link |
about a person who was sentenced
link |
for life in one of these camps.
link |
And he had some cancer,
link |
so he was transported back to one of these
link |
Soviet republics, I think it was
link |
South Asian republics. And the
link |
his experience being a
link |
prisoner, being a patient in the
link |
cancer clinic, in the cancer ward, surrounded
link |
by people, many of which die.
link |
it reads, first of all, later on I
link |
read the accounts of the doctors
link |
who describe the experiences
link |
in the book by the
link |
patient as incredibly accurate.
link |
So I read that there was some doctor saying that
link |
every single doctor should read this book to understand
link |
what the patient feels. But
link |
again, as many of the Solzhenitsyn's
link |
books, it has multiple levels of complexity.
link |
And obviously if you look above
link |
the cancer and the patient, the
link |
tumor that was growing and then disappeared
link |
body with some consequences, this is
link |
Soviet, and he actually
link |
when he was asked, he said that this is what made him
link |
think about this, how to combine these experiences.
link |
Him being a part of the Soviet regime,
link |
also being a part of the
link |
someone sent to Gulag camp,
link |
and also someone who experienced cancer
link |
in his life. The Gulag Archipelago
link |
and this book, these are the works that actually made him
link |
receive a Nobel Prize. But to me
link |
other books by Solzhenitsyn.
link |
This one to me is the most powerful one.
link |
And by the way, both this one and the previous one you read in Russian?
link |
Yes. So now there is the third book is an English book
link |
and it's completely different. So we're switching the gears
link |
completely. So this is the book which, it's not even
link |
a book, it's an essay by
link |
Jonathan Neumann called The Computer and the Brain.
link |
And that was the book he was writing
link |
knowing that he was dying of cancer.
link |
So the book was released back, it's a very thin book.
link |
the intellectual power in this book, in this essay
link |
is incredible. I mean you probably know that von Neumann
link |
is considered to be one of the biggest
link |
thinkers. So his intellectual power was incredible.
link |
And you can actually feel this power
link |
in this book where the person is writing knowing that he will be,
link |
he will die. The book actually got published only after his
link |
death back in 1958. He died in 1957.
link |
So he tried to put as many
link |
ideas that he still
link |
So this book is very difficult
link |
to read because every single paragraph
link |
is just compact, is
link |
filled with these ideas. And the ideas are incredible.
link |
Even nowadays, so he tried
link |
to put the parallels between the brain
link |
computing power, the neural system, and the computers
link |
as they were understood. Do you remember what year he was working on this?
link |
57. 57. So that was right during his,
link |
when he was diagnosed with cancer and he was essentially...
link |
Yeah, he's one of those, there's a few folks people mention,
link |
I think Ed Witten is another that like
link |
everyone that meets them, they say he's just an intellectual powerhouse.
link |
Yes. Okay, so who's the honorable mention?
link |
And this is, I mean, the reason I put it sort of in a separate section
link |
because this is a book that I recently
link |
listened to. So it's an audio book.
link |
And this is a book called Lab Girl by Hope Jarron.
link |
So Hope Jarron, she is a
link |
scientist, she's a geochemist that essentially
link |
fossil plants. And so she uses
link |
this fossil plant, the chemical analysis to understand
link |
what was the climate back in
link |
a thousand years, hundreds of thousands of years ago.
link |
And so something that incredibly
link |
touched me by this book, it was narrated by the author.
link |
Nice. And it's an incredibly
link |
personal story, incredibly. So
link |
certain parts of the book, you could actually hear the author crying.
link |
And that to me, I mean, I never experienced
link |
anything like this, reading the book, but it was like
link |
the connection between you and the author.
link |
And I think this is really
link |
a must read, but even better, a must listen
link |
to audio book for anyone who
link |
wants to learn about sort of
link |
academia, science, research in general, because it's
link |
a very personal account about her becoming
link |
we're just before New Year's.
link |
We talked a lot about some difficult topics of viruses and so on.
link |
Do you have some exciting things you're looking forward
link |
to in 2021? Some New Year's resolutions,
link |
maybe silly or fun, or
link |
something very important and fundamental to
link |
the world of science or something completely unimportant?
link |
Well, I'm definitely looking forward to
link |
towards things becoming normal.
link |
So yes, I really miss traveling.
link |
to an international summer school. It's called
link |
the School for Molecular and Theoretical Biology. It's held in Europe.
link |
It's organized by very good friends of mine. And this is
link |
the school for gifted kids from all over the world, and
link |
they're incredibly bright. It's like every time I go there, it's like, you know,
link |
it's a highlight of the year. And
link |
we couldn't make it this August, so we
link |
did this school remotely, but it's different.
link |
So I am definitely looking forward to next August
link |
coming there. One of
link |
my personal resolutions, I realized that
link |
being in the house and working from home,
link |
I realized that actually
link |
I apparently missed a lot
link |
spending time with my family,
link |
believe it or not. So you typically, with all the
link |
research and teaching and
link |
everything related to the academic life,
link |
I mean, you get distracted. And so
link |
you don't feel that
link |
the fact that you are away from your family doesn't affect you
link |
because you are naturally distracted by other things.
link |
So this time I realized that
link |
that's so important, right? Spending your time with
link |
the family, with your kids. And so that
link |
would be my new year resolution and actually trying to
link |
spend as much time as possible. Even when the world opens up.
link |
Yeah, that's a beautiful message. That's a beautiful reminder.
link |
I asked you if there's a Russian poem
link |
that I could read, that I could force you to read, and you said, okay, fine, sure.
link |
Do you mind reading?
link |
And you said that no paper needed.
link |
So this poem was written by my namesake,
link |
another Dmitry, Dmitry Kemerefeld.
link |
It's a recent poem and it's
link |
called Sorceress, Vyadma,
link |
in Russian, or actually
link |
Koldunya. So that's sort of another sort of connotation of
link |
sorceress or witch. And I really like it
link |
and it's one of just a handful poems I actually
link |
can recall by heart. I also have a very strong
link |
association when I read this poem with Master and
link |
Margarita, the main female character,
link |
Margarita. And also it's
link |
about, it's happening about the same time we're talking
link |
now, so around New Year,
link |
around Christmas. Do you mind reading it in Russian?
link |
I'll give it a try.
link |
So you narrowed your eyes,
link |
that anyone who was blessed
link |
was ready to give their soul to the devil
link |
for this witch's connection.
link |
And I, without prejudice,
link |
ran out to feel your
link |
amazing breath on your lips,
link |
to remember how you flew above the earth
link |
in a white haze, in a white mist.
link |
That's beautiful. I love how it captures a moment of longing
link |
and maybe love even.
link |
Yes. To me it has a lot of meaning about
link |
this something that is happening,
link |
something that is far away, but still very close to you.
link |
And yes, it's the winter.
link |
There's something magical about winter, isn't there?
link |
I don't know how to translate it, but a kiss in winter
link |
is interesting. Lips in winter and all that kind of stuff.
link |
It's beautiful. Russian has a way. It has a reason, Russian poetry
link |
is just, I'm a fan of poetry in both languages, but English
link |
doesn't capture some of the magic that Russian seems to, so
link |
thank you for doing that. That was awesome. Dmitry,
link |
it's great to talk to you again. It's contagious
link |
how much you love what you do, how much you love life, so I really appreciate
link |
you taking the time to talk today. And thank you for having me.
link |
Thanks for listening to this conversation with Dmitry Korkin, and thank you to our
link |
sponsors. Brave Browser, NetSuite Business Management
link |
Software, Magic Spoon Low Carb Cereal, and
link |
Asleep Self Cooling Mattress. So the choice is
link |
browsing privacy, business success, healthy diet, or comfortable
link |
sleep. Choose wisely, my friends. And if you wish,
link |
click the sponsor links below to get a discount and to support this podcast.
link |
And now, let me leave you with some words from Jeffrey Eugenides.
link |
Biology gives you a brain.
link |
Life turns it into a mind. Thank you for listening,
link |
and hope to see you next time.