back to index

Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153


small model | large model

link |
00:00:00.000
The following is a conversation with Dimitri Korkin,
link |
00:00:02.880
his second time in the podcast.
link |
00:00:04.840
He's a professor of bioinformatics
link |
00:00:06.960
and computational biology at WPI,
link |
00:00:09.720
where he specializes in bioinformatics
link |
00:00:12.160
of complex disease, computational genomics,
link |
00:00:15.080
systems biology, and biomedical data analytics.
link |
00:00:18.520
He loves biology, he loves computing,
link |
00:00:22.080
plus he is Russian and recites a poem in Russian
link |
00:00:26.140
at the end of the podcast.
link |
00:00:27.760
What else could you possibly ask for in this world?
link |
00:00:31.120
Quick mention of our sponsors, Brave Browser,
link |
00:00:34.720
NetSuite Business Management Software,
link |
00:00:37.800
Magic Spoon Low Carb Serial,
link |
00:00:40.320
and Aidsleep Self Cooling Mattress.
link |
00:00:42.960
So the choice is browsing privacy, business success,
link |
00:00:46.400
healthy diet, or comfortable sleep.
link |
00:00:49.240
Choose wisely, my friends, and if you wish,
link |
00:00:51.720
click the sponsor links below
link |
00:00:53.680
to get a discount and to support this podcast.
link |
00:00:56.480
As a side note, let me say that to me,
link |
00:00:58.640
the scientists that did the best,
link |
00:01:00.440
apolitical, impactful, brilliant work of 2020
link |
00:01:04.040
are the biologists who study viruses without an agenda,
link |
00:01:09.200
without much sleep, to be honest,
link |
00:01:11.840
just a pure passion for scientific discovery
link |
00:01:14.520
and exploration of the mysteries within viruses.
link |
00:01:18.440
Viruses are both terrifying and beautiful,
link |
00:01:21.400
terrifying because they can threaten
link |
00:01:23.000
the fabric of human civilization,
link |
00:01:25.160
both biological and psychological.
link |
00:01:27.880
Beautiful because they give us insights
link |
00:01:30.480
into the nature of life on Earth
link |
00:01:32.960
and perhaps even extraterrestrial life
link |
00:01:35.920
of the not so intelligent variety
link |
00:01:38.000
that might meet us one day
link |
00:01:39.600
as we explore the habitable planets and moons
link |
00:01:42.440
in our universe.
link |
00:01:43.800
If you enjoy this thing, subscribe on YouTube,
link |
00:01:45.840
review it on Apple Podcast, follow on Spotify,
link |
00:01:49.080
support on Patreon, or connect with me
link |
00:01:50.960
on Twitter at Lex Freedman.
link |
00:01:53.200
And now here's my conversation with Dimitri Korkin.
link |
00:01:57.960
It's often said that proteins and the amino acid residues
link |
00:02:03.160
that make them up are the building blocks of life.
link |
00:02:06.440
Do you think of proteins in this way
link |
00:02:08.040
as the basic building blocks of life?
link |
00:02:11.200
Yes and no.
link |
00:02:12.240
So the proteins indeed is the basic unit,
link |
00:02:16.360
biological unit that carries out
link |
00:02:20.520
important function of the cell.
link |
00:02:22.880
However, through studying the proteins
link |
00:02:25.800
and comparing the proteins across different species,
link |
00:02:29.360
across different kingdoms,
link |
00:02:31.440
you realize that proteins are actually
link |
00:02:34.680
much more complicated.
link |
00:02:36.760
So they have so called modular complexity.
link |
00:02:42.320
And so what I mean by that is an average protein
link |
00:02:47.320
consists of several structural units.
link |
00:02:54.800
So we call them protein domains.
link |
00:02:57.480
And so you can imagine a protein as a string of beads
link |
00:03:02.600
where each bead is a protein domain.
link |
00:03:05.760
And in the past 20 years,
link |
00:03:10.240
scientists have been studying the nature
link |
00:03:13.600
of the protein domains.
link |
00:03:15.040
Cause we realized that it's the unit.
link |
00:03:19.480
Because if you look at the functions, right?
link |
00:03:22.120
So many proteins have more than one function.
link |
00:03:25.920
And those protein functions are often carried out
link |
00:03:29.440
by those protein domains.
link |
00:03:31.560
So we also see that in the evolution,
link |
00:03:37.320
those proteins domains get shuffled.
link |
00:03:40.160
So they act actually as a unit.
link |
00:03:43.440
Also from the structural perspective, right?
link |
00:03:45.280
So, you know, some people think of a protein
link |
00:03:50.960
as a sort of a globular molecule.
link |
00:03:55.320
But as a matter of fact is the globular part
link |
00:04:00.080
of this protein is a protein domain.
link |
00:04:02.520
So we often have this, you know,
link |
00:04:05.640
again, the collection of this protein domains
link |
00:04:09.600
align on a string as beads.
link |
00:04:14.760
And the protein domains are made up of amino acid residues.
link |
00:04:17.880
So we're talking.
link |
00:04:18.720
So this is the basic,
link |
00:04:20.600
so you're saying the protein domain
link |
00:04:22.560
is the basic building block of the function
link |
00:04:25.600
that we think about proteins doing.
link |
00:04:28.280
So, of course, you can always talk about
link |
00:04:30.360
different building blocks with turtles all the way down.
link |
00:04:32.840
But there's a point where there is at the point
link |
00:04:36.840
of the hierarchy where it's the most, the cleanest
link |
00:04:40.680
element block based on which you can put them together
link |
00:04:46.240
in different kinds of ways to form complex function.
link |
00:04:49.200
And you're saying protein domains,
link |
00:04:50.880
why is that not talked about as often in popular culture?
link |
00:04:55.160
Well, you know, there are several perspectives on this.
link |
00:04:59.280
And one, of course, is the historical perspective, right?
link |
00:05:03.200
So historically, scientists have been able
link |
00:05:07.800
to structurally resolved to obtain the 3D coordinates
link |
00:05:12.400
of a protein for, you know, for smaller proteins.
link |
00:05:17.560
And smaller proteins tend to be a single domain protein.
link |
00:05:21.000
So we have a protein equal to a protein domain.
link |
00:05:24.000
And so because of that, the initial suspicion was
link |
00:05:27.360
that the proteins are, they have globular shapes
link |
00:05:31.720
and the more of smaller proteins you obtain structurally,
link |
00:05:36.840
the more you became convinced that that's the case.
link |
00:05:41.840
And only later when we started having, you know,
link |
00:05:47.920
alternative approaches.
link |
00:05:49.640
So, you know, the traditional ones are Xray crystallography
link |
00:05:55.920
and NMR spectroscopy.
link |
00:05:57.320
So these are sort of the two main techniques
link |
00:06:02.000
that give us the 3D coordinates.
link |
00:06:04.440
But nowadays, there's huge breakthrough
link |
00:06:07.760
in cryoelectron microscopy.
link |
00:06:10.480
So the more advanced methods that allow us to, you know,
link |
00:06:15.480
to get into the, you know, 3D shapes
link |
00:06:19.640
of much larger molecules, molecular complexes,
link |
00:06:23.480
just to give you one of the common examples for this year.
link |
00:06:29.200
Right?
link |
00:06:30.040
So the first experimental structure
link |
00:06:32.760
of a SARS CoV2 protein was the cryoem structure
link |
00:06:38.360
of the S protein.
link |
00:06:40.160
So the spike protein.
link |
00:06:41.960
And so it was solved very quickly.
link |
00:06:46.320
And the reason for that is the advancement
link |
00:06:49.480
of this technology is pretty spectacular.
link |
00:06:53.920
How many domains is the, is it more than one domain?
link |
00:06:57.480
Oh, yes.
link |
00:06:58.320
Oh, yes, I mean, so it's a very complex structure.
link |
00:07:01.320
And we, you know, on top of the complexity
link |
00:07:06.480
of a single protein, right?
link |
00:07:08.520
So this structure is actually, is a complex, is a trimer.
link |
00:07:13.720
So it needs to form a trimer in order to function properly.
link |
00:07:17.640
What's a complex?
link |
00:07:18.760
So a complex is a glomeration of multiple proteins.
link |
00:07:22.920
And so we can have the same protein copied
link |
00:07:28.240
in multiple, you know, made up in multiple copies
link |
00:07:32.120
and forming something that we called a Homo oligomer.
link |
00:07:36.200
Homo means the same, right?
link |
00:07:38.160
So, so in this case, so the spike protein is the,
link |
00:07:42.840
is an example of a Homo tetramer, Homo trimer, sorry.
link |
00:07:46.800
So means three copies of a three copies in order to.
link |
00:07:50.040
Exactly.
link |
00:07:50.880
We have these three chains,
link |
00:07:52.760
the three molecular chains coupled together
link |
00:07:56.800
and performing the function.
link |
00:07:58.480
That's what, when you look at this protein from the top,
link |
00:08:02.360
you see a perfect triangle.
link |
00:08:04.560
So, but other, you know, so other complexes are made up
link |
00:08:08.880
of, you know, different proteins.
link |
00:08:12.840
Some of them are completely different.
link |
00:08:15.400
Some of them are similar, the hemoglobin molecule, right?
link |
00:08:18.880
So it's actually, it's a protein complex.
link |
00:08:21.880
It's made of four basic subunits.
link |
00:08:25.760
Two of them are identical to each other.
link |
00:08:29.040
Two other are identical to each other,
link |
00:08:30.800
but they are also similar to each other,
link |
00:08:32.840
which sort of gives us some ideas about the evolution
link |
00:08:36.960
of this, you know, of this molecule.
link |
00:08:40.640
And perhaps one of the hypotheses that, you know,
link |
00:08:44.000
in the past, it was just a Homo tetramer, right?
link |
00:08:48.280
So four identical copies, and then it became, you know,
link |
00:08:53.120
sort of modified, it became mutated over the time
link |
00:08:58.520
and became more specialized.
link |
00:09:00.160
Can we linger on the spike protein for a little bit?
link |
00:09:02.560
Is there something interesting
link |
00:09:04.920
or like beautifully you find about it?
link |
00:09:06.960
I mean, first of all, it's an incredibly challenging protein.
link |
00:09:10.960
And so we, as a part of our sort of research
link |
00:09:16.120
to understand the structural basis of this virus
link |
00:09:20.200
to sort of decode, structure decode
link |
00:09:22.760
every single protein in its proteome,
link |
00:09:27.560
which, you know, we've been working on the spike protein.
link |
00:09:31.760
And one of the main challenges was that
link |
00:09:34.440
cryoem data allows us to reconstruct
link |
00:09:42.720
or to obtain the 3D coordinates
link |
00:09:44.640
of roughly two thirds of the protein.
link |
00:09:48.040
The rest of the one third of this protein,
link |
00:09:51.960
it's a part that is buried into the membrane of the virus
link |
00:09:58.400
and of the viral envelope.
link |
00:10:01.560
And it also has a lot of unstable structures around it.
link |
00:10:06.920
So it's chemically interacting somehow
link |
00:10:08.640
with whatever the heck it's connecting?
link |
00:10:10.160
Yeah, so people are still trying to understand.
link |
00:10:12.800
So the nature of, and the role of this, you know,
link |
00:10:17.040
of this one third,
link |
00:10:18.560
because the top part, you know,
link |
00:10:21.320
the primary function is to get attached to the, you know,
link |
00:10:26.720
ACE2 receptor, human receptor.
link |
00:10:29.560
There is also beautiful, you know,
link |
00:10:33.120
mechanics of how this thing happens, right?
link |
00:10:36.080
So because there are three different copies
link |
00:10:38.720
of this chains, you know,
link |
00:10:41.960
there are three different domains, right?
link |
00:10:44.840
So we're talking about domains.
link |
00:10:46.080
So this is the receptor binding domains, RBDs,
link |
00:10:49.240
that gets untaggled
link |
00:10:51.680
and get ready to get attached to the receptor.
link |
00:10:56.680
And now they are not necessarily going in a sync mode.
link |
00:11:04.160
As a matter of fact.
link |
00:11:05.360
Say synchronous?
link |
00:11:06.640
So, yes.
link |
00:11:07.800
So, and this is where, you know,
link |
00:11:11.000
another level of complexity comes into play
link |
00:11:13.600
because right now what we see is,
link |
00:11:17.120
we typically see just one of the arms going out
link |
00:11:21.840
and getting ready to be attached
link |
00:11:24.400
to the ACE2 receptors.
link |
00:11:27.560
However, there was a recent mutation
link |
00:11:30.360
that people studied in that spike protein
link |
00:11:35.080
and a very recently,
link |
00:11:39.720
a group from UMass Medical School
link |
00:11:43.560
will happen to collaborate with groups.
link |
00:11:45.280
So this is a group of Jeremy Luban
link |
00:11:47.240
and a number of other faculty.
link |
00:11:50.640
They actually solved the mutated structure of the spike
link |
00:11:59.040
and they showed that actually,
link |
00:12:01.640
because of these mutations,
link |
00:12:03.000
you have more than one arms opening up.
link |
00:12:08.840
And so now, so the frequency of two arms going up
link |
00:12:15.040
increase quite drastically.
link |
00:12:18.120
How does that change the dynamics somehow?
link |
00:12:21.160
It potentially can change the dynamics of,
link |
00:12:24.360
because now you have two possible opportunities
link |
00:12:28.440
to get attached to the ACE2 receptor.
link |
00:12:31.160
It's a very complex molecular process,
link |
00:12:34.000
mechanistic process.
link |
00:12:35.320
But the first step of this process
link |
00:12:37.440
is the attachment of this spike protein,
link |
00:12:41.520
of the spike trimer to the human ACE2 receptor.
link |
00:12:46.520
So this is a molecule that sits on the surface
link |
00:12:49.960
of the human cell.
link |
00:12:51.880
And that's essentially what triggers
link |
00:12:55.400
the whole process of encapsulation.
link |
00:12:58.840
If this was dating, this would be the first date.
link |
00:13:01.360
So this is the...
link |
00:13:03.000
In a way, yes.
link |
00:13:05.600
So is it possible that the spike protein
link |
00:13:07.880
just like floating about on its own
link |
00:13:10.560
or does it need that interactability with the membrane?
link |
00:13:14.600
Yeah, so it needs to be attached,
link |
00:13:16.880
at least as far as I know.
link |
00:13:19.000
But when you get this thing attached on the surface,
link |
00:13:23.280
there is also a lot of dynamics
link |
00:13:25.120
on how it sits on the surface.
link |
00:13:28.160
So for example, there was a recent work in,
link |
00:13:32.160
again, where people use the cryoelectron microscopy
link |
00:13:35.760
to get the first glimpse of the overall structure.
link |
00:13:38.920
It's a very low res,
link |
00:13:40.120
but you still get some interesting details
link |
00:13:43.800
about the surface, about what is happening inside
link |
00:13:47.040
because we have literally no clue
link |
00:13:49.040
until recent work about how the capsid is organized.
link |
00:13:54.520
What's the capsid?
link |
00:13:55.360
So capsid is essentially the inner core
link |
00:13:59.080
of the viral particle where there is the RNA of the virus
link |
00:14:05.040
and it's protected by another protein and protein
link |
00:14:10.280
that essentially acts as a shield.
link |
00:14:13.480
But now we are learning more and more,
link |
00:14:16.560
so it's actually, it's not just this shield,
link |
00:14:18.640
it's potentially used for the stability
link |
00:14:21.840
of the outer shell of the virus.
link |
00:14:25.080
So it's pretty complicated.
link |
00:14:27.880
And understanding all of this is really useful
link |
00:14:30.520
for trying to figure out like developing a vaccine
link |
00:14:33.040
or some kind of drug to attack any aspects of this, right?
link |
00:14:36.080
So I mean, there are many different implications to that.
link |
00:14:39.640
First of all, it's important to understand the virus itself.
link |
00:14:44.600
So in order to understand how it acts,
link |
00:14:51.600
what is the overall mechanistic process
link |
00:14:55.360
of this virus replication
link |
00:14:57.360
of this virus proliferation to the cell, right?
link |
00:15:00.600
So that's one aspect.
link |
00:15:02.640
The other aspect is designing new treatments.
link |
00:15:06.480
So one of the possible treatments is, you know,
link |
00:15:10.560
designing nanoparticles.
link |
00:15:12.480
And so some nanoparticles that will resemble the viral shape
link |
00:15:17.200
that would have the spike integrated
link |
00:15:19.520
and essentially would act as a competitor to the real virus
link |
00:15:23.680
by blocking the ACE2 receptors
link |
00:15:26.680
and thus preventing the real virus entering the cell.
link |
00:15:31.880
Now, there are also, you know,
link |
00:15:33.920
there is a very interesting direction
link |
00:15:36.720
in looking at the membrane,
link |
00:15:39.400
at the envelope portion of the protein
link |
00:15:42.000
and attacking its M protein.
link |
00:15:45.960
So there are, you know, to give you, you know,
link |
00:15:49.440
sort of a brief overview,
link |
00:15:51.200
there are four structural proteins
link |
00:15:53.320
that these are the proteins that made up
link |
00:15:55.760
a structure of the virus.
link |
00:15:59.320
So spike S protein that acts as a trimer.
link |
00:16:04.080
So it needs three copies.
link |
00:16:07.240
E envelope protein that acts as a pentamer.
link |
00:16:10.800
So it needs five copies to act properly.
link |
00:16:13.720
M is a membrane protein.
link |
00:16:17.920
It forms dimers and actually it forms beautiful lattice.
link |
00:16:21.680
And this is something that we've been studying
link |
00:16:23.520
and we are seeing it in simulations.
link |
00:16:25.680
It actually forms a very nice grid
link |
00:16:28.080
or, you know, threads, you know,
link |
00:16:31.720
of different dimers attached next to each other.
link |
00:16:34.880
There's a bunch of copies of each other
link |
00:16:36.240
and they naturally, when you have a bunch of copies
link |
00:16:38.280
of each other, they form an interesting lattice.
link |
00:16:40.240
Exactly.
link |
00:16:41.080
And, you know, if you think about this, right?
link |
00:16:43.520
So this complex, you know,
link |
00:16:46.920
the viral shape needs to be organized somehow,
link |
00:16:51.120
self organized somehow, right?
link |
00:16:53.520
So, you know, if it was a completely random process,
link |
00:16:57.920
you know, you probably wouldn't have the envelope shell
link |
00:17:02.120
of the ellipsoid shape.
link |
00:17:03.720
You know, you would have something, you know,
link |
00:17:05.880
pre random, right shape.
link |
00:17:07.640
So there is some, you know, regularity
link |
00:17:10.600
and how this, you know, how this M dimers
link |
00:17:16.720
get to attach to each other in a very specific,
link |
00:17:19.600
directed way.
link |
00:17:20.560
Is that understood at all?
link |
00:17:23.080
It's not understood.
link |
00:17:24.320
We are now, we've been working in the past six months
link |
00:17:28.440
since, you know, we're met.
link |
00:17:29.880
Actually, this is where we started working
link |
00:17:32.160
on trying to understand the overall structure
link |
00:17:35.000
of the envelope and the key components
link |
00:17:38.200
that made up this, you know, structure.
link |
00:17:41.080
Wait, does the envelope also have the lattice structure
link |
00:17:43.240
or no?
link |
00:17:44.080
So the envelope is essentially is the outer shell
link |
00:17:47.360
of the viral particle.
link |
00:17:48.840
The N, the nucleocapsid protein,
link |
00:17:51.600
is something that is inside.
link |
00:17:54.000
Got it.
link |
00:17:54.840
But get that.
link |
00:17:56.520
The N is likely to interact with M.
link |
00:17:59.520
Does it go M and E?
link |
00:18:01.480
Like where's the E and the N?
link |
00:18:02.640
So E, those different proteins,
link |
00:18:05.640
they occur in different copies on the viral particle.
link |
00:18:10.800
So E, this pentamer complex,
link |
00:18:13.960
we only have two or three maybe per each particle.
link |
00:18:18.520
Okay, we have 1,000 or so of M dimers
link |
00:18:24.520
that essentially makes up the entire outer shell.
link |
00:18:30.920
So most of the outer shell is the M.
link |
00:18:33.680
M dimer and the M protein.
link |
00:18:35.640
When you say particle, that's the viral on the virus,
link |
00:18:39.120
the individual virus.
link |
00:18:40.280
It's a single, yes.
link |
00:18:41.120
Single element of the virus, single virus.
link |
00:18:43.640
Single virus, right.
link |
00:18:45.080
And we have about, you know, roughly 50 to 90 spike trimmers.
link |
00:18:50.840
Right, so when you show a...
link |
00:18:54.000
Per virus particle.
link |
00:18:55.080
Per virus particle.
link |
00:18:56.560
So what did you say, 50 to 90?
link |
00:18:58.640
50 to 90, right.
link |
00:19:00.680
So this is how this thing is organized.
link |
00:19:04.000
And so now, typically, right,
link |
00:19:06.360
so you see the antibodies that target spike protein,
link |
00:19:11.360
you know, spike protein,
link |
00:19:13.200
certain parts of the spike protein,
link |
00:19:15.120
but there could be some, also some treatments, right?
link |
00:19:17.920
So these are, you know, these are small molecules
link |
00:19:21.960
that bind strategic parts of these proteins
link |
00:19:27.480
disrupting its function.
link |
00:19:29.640
So one of the promising directions,
link |
00:19:34.000
it's one of the newest directions,
link |
00:19:35.560
is actually targeting the M dimer of the protein.
link |
00:19:40.560
Targeting the proteins that make up this outer shell.
link |
00:19:44.120
Because if you're able to destroy the outer shell,
link |
00:19:47.640
you're essentially destroying the viral particle itself.
link |
00:19:52.160
So preventing it from, you know, function at all.
link |
00:19:56.680
So that's, you think,
link |
00:19:58.000
is from a sort of cybersecurity perspective,
link |
00:20:01.440
virus security perspective,
link |
00:20:02.920
that's the best attack vector is,
link |
00:20:05.720
or like that's a promising attack vector.
link |
00:20:08.440
I would say yes.
link |
00:20:09.280
So I mean, there's still tons of research needs to be,
link |
00:20:12.680
you know, to be done.
link |
00:20:13.960
But yes, I think, you know, so.
link |
00:20:16.560
There's more attack surface, I guess.
link |
00:20:18.840
More attack surface,
link |
00:20:19.840
but you know, from our analysis,
link |
00:20:22.240
from other evolution analysis,
link |
00:20:24.160
this protein is evolutionary more stable
link |
00:20:27.960
compared to the spike protein.
link |
00:20:31.480
Stable means a more static target.
link |
00:20:35.480
Well, yeah, so it doesn't change.
link |
00:20:38.360
It doesn't evolve from the evolutionary perspective
link |
00:20:42.120
so drastically as, for example, the spike protein.
link |
00:20:46.000
There's a bunch of stuff in the news
link |
00:20:47.960
about mutations of the virus in the United Kingdom.
link |
00:20:51.440
I also saw in South Africa something,
link |
00:20:54.160
maybe that was yesterday.
link |
00:20:56.360
You just kind of mentioned about stability and so on.
link |
00:21:00.200
Which aspects of this are mutatable
link |
00:21:02.800
and which aspects, if mutated, become more dangerous.
link |
00:21:07.600
And maybe even zooming out,
link |
00:21:09.280
what are your thoughts and knowledge and ideas
link |
00:21:12.080
about the way it's mutated,
link |
00:21:13.680
all the news that we've been hearing.
link |
00:21:15.400
Are you worried about it from a biological perspective?
link |
00:21:18.520
Are you worried about it from a human perspective?
link |
00:21:21.320
So I mean, you know, mutations are sort of a general way
link |
00:21:26.320
for these viruses to evolve, right?
link |
00:21:28.640
So it's, you know, it's essentially,
link |
00:21:32.680
this is the way they evolve.
link |
00:21:34.760
This is the way they were able to jump from, you know,
link |
00:21:39.960
one species to another.
link |
00:21:42.080
We also see, you know, some recent jumps.
link |
00:21:46.800
There were some incidents of this virus jumping
link |
00:21:50.000
from human to dogs.
link |
00:21:51.880
So, you know, there is some danger in those jumps
link |
00:21:55.880
because, you know, every time it jumps, it also mutates, right?
link |
00:21:59.520
So it, when it jumps to the species
link |
00:22:04.400
and jumps back, right?
link |
00:22:06.160
So it acquires some mutations that are sort of driven
link |
00:22:12.240
by the environment of a new host, right?
link |
00:22:16.360
And it's different from the human environment.
link |
00:22:19.280
And so we don't know whether the mutations
link |
00:22:21.440
that are acquired in the new species are neutral
link |
00:22:26.200
with respect to the human host
link |
00:22:28.120
or maybe, you know, maybe damaging.
link |
00:22:32.080
Yeah, change is always scary, but so are you worried about,
link |
00:22:36.520
I mean, it seems like because the spread is during winter,
link |
00:22:39.720
now it seems to be exceptionally high.
link |
00:22:43.560
And especially with a vaccine just around the corner,
link |
00:22:46.760
already being actually deployed,
link |
00:22:49.160
there's some worry that there's,
link |
00:22:51.320
this puts evolutionary pressure,
link |
00:22:53.000
selective pressure on the virus,
link |
00:22:55.600
afford to, to mute, for you to mutate.
link |
00:22:59.000
Is that a source of worry?
link |
00:23:00.400
Well, I mean, there is always this thought, you know,
link |
00:23:03.400
in the scientist's mind, you know, what happened,
link |
00:23:07.080
what will happen, right?
link |
00:23:08.680
So I know there've been discussions about sort
link |
00:23:13.680
of the arms race between the, you know,
link |
00:23:16.760
the ability of the humanity to, you know,
link |
00:23:23.320
to get vaccinated faster than the virus,
link |
00:23:27.040
you know, essentially, you know, becomes, you know,
link |
00:23:32.120
resistant to the vaccine.
link |
00:23:35.600
I, I mean, I don't worry that much,
link |
00:23:41.760
simply because, you know, there is not that much evidence
link |
00:23:46.200
to that, to aggressive mutation around the vaccine.
link |
00:23:50.040
Exactly, you know, obviously there are mutations
link |
00:23:52.560
around the vaccine, you know, there are vaccines.
link |
00:23:54.920
So the reason we get vaccinated every year
link |
00:24:00.000
against the season of the mutations, right?
link |
00:24:03.920
But, you know, I think it's important to study it.
link |
00:24:08.520
No doubts, right?
link |
00:24:09.720
So, so I think one of the, you know, to me,
link |
00:24:12.640
and again, I might be biased because, you know,
link |
00:24:16.040
we've been trying to do that as well.
link |
00:24:20.080
So, but one of the critical directions
link |
00:24:22.920
in understanding the virus is to understand its evolution
link |
00:24:26.560
in order to sort of understand the mechanisms,
link |
00:24:30.200
the key mechanisms that lead the virus to jump,
link |
00:24:33.640
you know, the Nordic viruses to jump from species,
link |
00:24:36.960
from species to another,
link |
00:24:38.640
that the mechanisms that lead the virus
link |
00:24:41.680
to become resistant to vaccines also to treatments, right?
link |
00:24:47.680
And hopefully that knowledge will enable us
link |
00:24:51.480
to sort of forecast the evolutionary traces,
link |
00:24:55.480
the future evolutionary traces of this virus.
link |
00:24:58.080
I mean, what, from a biological perspective,
link |
00:25:00.920
this might be a dumb question,
link |
00:25:02.120
but is there parts of the virus that if souped up
link |
00:25:07.880
like through mutation could make it more effective
link |
00:25:11.720
at doing its job?
link |
00:25:12.560
We're talking about the specific coronavirus, like,
link |
00:25:15.640
because we were talking about the different,
link |
00:25:16.840
like the membrane, the end protein, the E protein,
link |
00:25:20.880
the N and the S, the spike.
link |
00:25:25.520
Is there some?
link |
00:25:26.520
And there are 20 or so more in addition to that.
link |
00:25:30.280
But is that a dumb way to look at it?
link |
00:25:32.320
Like, which of these, if mutated,
link |
00:25:37.000
could have the greatest impact, potentially damaging impact
link |
00:25:42.000
on the effectiveness of the virus?
link |
00:25:43.480
So, it's actually, it's a very good question
link |
00:25:46.560
because, and the short answer is we don't know yet,
link |
00:25:50.120
but of course there is capacity of this virus
link |
00:25:53.480
to become more efficient.
link |
00:25:55.560
The reason for that is, you know,
link |
00:25:58.720
so if you look at the virus, I mean, it's a machine, right?
link |
00:26:01.800
So it's a machine that does a lot of different functions.
link |
00:26:05.520
And many of these functions are sort of nearly perfect,
link |
00:26:08.480
but they're not perfect.
link |
00:26:09.800
And those mutations can make those functions more perfect.
link |
00:26:14.120
For example, the attachment to ACE2 receptor, right,
link |
00:26:18.240
of the spike, right?
link |
00:26:19.400
So, you know, has this virus reached the efficiency
link |
00:26:28.360
in which the attachment is carried out?
link |
00:26:31.560
Or there are some mutations that still to be discovered,
link |
00:26:36.080
right, that will make this attachment sort of stronger,
link |
00:26:41.920
or, you know, something in a way more efficient
link |
00:26:46.920
from the point of view of this virus functioning.
link |
00:26:50.600
That's sort of the obvious example,
link |
00:26:53.440
but if you look at each of these proteins,
link |
00:26:56.040
I mean, it's there for a reason, it performs certain function.
link |
00:26:59.360
And it could be that certain mutations will,
link |
00:27:05.520
you know, enhance this function.
link |
00:27:07.160
It could be that some mutations will make this function
link |
00:27:10.160
much less efficient, right?
link |
00:27:12.280
So that's also the case.
link |
00:27:14.360
Let's, since we're talking about the evolution
link |
00:27:17.200
in the history of a virus, let's zoom back out
link |
00:27:21.200
and look at the evolution of proteins.
link |
00:27:23.800
I glanced at this 2010 Nature paper on the quote,
link |
00:27:29.720
ongoing expansion of the protein universe.
link |
00:27:32.920
And then, you know, it kind of implies and talks about
link |
00:27:38.200
that proteins started with a common ancestor,
link |
00:27:41.160
which is kind of interesting.
link |
00:27:43.240
It's interesting to think about like even just like
link |
00:27:45.440
the first organic thing that started life on Earth.
link |
00:27:50.720
And from that, there's now, you know,
link |
00:27:54.280
what is it, 3.5 billion years later,
link |
00:27:56.800
there's now millions of proteins and they're still evolving.
link |
00:28:00.080
And that's, you know, in part, one of the things
link |
00:28:02.360
that you're researching, is there something interesting
link |
00:28:05.160
to you about the evolution of proteins
link |
00:28:08.760
from this initial ancestor to today?
link |
00:28:14.640
Is there something beautiful and insightful
link |
00:28:16.320
about this long story?
link |
00:28:18.160
So I think, you know, if I were to pick a single keyword
link |
00:28:24.120
about protein evolution, I would pick modularity,
link |
00:28:29.360
something that we talked about in the beginning.
link |
00:28:32.920
And that's the fact that the proteins are no longer considered
link |
00:28:37.920
as, you know, as a sequence of letters.
link |
00:28:41.360
There are hierarchical complexities
link |
00:28:45.960
in the way these proteins are organized.
link |
00:28:48.320
And these complexities are actually going beyond
link |
00:28:52.800
the protein sequence.
link |
00:28:54.000
It's actually going all the way back to the gene,
link |
00:28:57.800
to the nucleotide sequence.
link |
00:29:00.040
And so, you know, again, these protein domains,
link |
00:29:04.880
they are not only functional building blocks,
link |
00:29:07.960
they are also evolutionary building blocks.
link |
00:29:10.040
And so what we see in the sort of,
link |
00:29:12.680
in the later stages of evolution,
link |
00:29:15.200
I mean, once this stable, structurally,
link |
00:29:18.800
and functionally building blocks were discovered,
link |
00:29:22.120
they essentially stay, those domains stay as such.
link |
00:29:28.120
So that's why if you start comparing different proteins,
link |
00:29:31.640
you will see that many of them will have similar fragments.
link |
00:29:37.480
And those fragments will correspond to something
link |
00:29:39.680
that we call protein domain families.
link |
00:29:42.280
And so, they are still different
link |
00:29:44.040
because you still have mutations and, you know,
link |
00:29:48.000
the, you know, different mutations are attributed
link |
00:29:52.840
to, you know, diversification of the function
link |
00:29:56.200
of this, you know, protein domain.
link |
00:29:58.880
However, you don't, you very rarely see, you know,
link |
00:30:03.560
the evolutionary events that would split this domain
link |
00:30:08.640
into fragments because, and it's, you know,
link |
00:30:13.040
once you have the domain split,
link |
00:30:17.240
you actually, you know, you can completely cancel
link |
00:30:22.520
out its function, or at the very least, you can reduce it.
link |
00:30:26.440
And that's not, you know, efficient from the point of view
link |
00:30:29.600
of the, you know, of the cell function.
link |
00:30:32.840
So the protein domain level is a very important one.
link |
00:30:39.120
Now, on top of that, right?
link |
00:30:42.000
So if you look at the proteins, right?
link |
00:30:44.080
So you have this structural units
link |
00:30:46.320
and they carry out the function.
link |
00:30:48.160
But then much less is known about things
link |
00:30:51.840
that connect this protein domains.
link |
00:30:54.360
Something that we call linkers, and those linkers
link |
00:30:57.480
are completely flexible, you know, parts of the protein
link |
00:31:02.320
that nevertheless carry out a lot of function.
link |
00:31:06.360
It's like little tails or heads.
link |
00:31:08.040
So, so, so we do have tails.
link |
00:31:09.800
So they call termini, C and N termini.
link |
00:31:12.280
So these are things right on the, on one and another ends
link |
00:31:18.320
of the protein sequence.
link |
00:31:20.040
So they are also very important.
link |
00:31:22.560
So they attribute it to very specific interactions
link |
00:31:26.360
between the proteins.
link |
00:31:27.760
So.
link |
00:31:28.600
But you're referring to the links between domains
link |
00:31:30.840
that connect the domains.
link |
00:31:32.640
And, you know, apart from the, just the simple perspective,
link |
00:31:38.480
if you have, you know, a very short domain,
link |
00:31:41.240
you have, sorry, a very short linker,
link |
00:31:43.760
you have two domains next to each other.
link |
00:31:45.880
They are forced to be next to each other.
link |
00:31:47.600
If you have a very long one, you have the domains
link |
00:31:50.440
that are extremely flexible and they carry out
link |
00:31:53.160
a lot of sort of spatial reorganization, right?
link |
00:31:56.920
That's so awesome.
link |
00:31:58.160
But on top of that, right, just this linker itself,
link |
00:32:02.000
because it's so flexible, it actually can adapt
link |
00:32:05.800
to a lot of different shapes.
link |
00:32:07.520
And therefore, it's a very good interactor
link |
00:32:11.120
when it comes to interaction between this protein
link |
00:32:14.040
and other protein, right?
link |
00:32:15.720
So these things also evolve, you know,
link |
00:32:18.960
and they, in a way, have different sort of laws of,
link |
00:32:25.560
the driving laws that underlie the evolution
link |
00:32:30.600
because they no longer need to preserve
link |
00:32:34.560
certain structure, right, unlike protein domains.
link |
00:32:38.920
And so on top of that, you have something
link |
00:32:43.360
that is even less studied.
link |
00:32:45.880
And this is something that attribute to the concept
link |
00:32:50.480
of alternative splicing.
link |
00:32:53.240
So alternative splicing.
link |
00:32:54.480
So it's a very cool concept, it's something
link |
00:32:57.400
that we've been fascinated about for over a decade
link |
00:33:02.800
in my lab and trying to do research with that.
link |
00:33:05.520
But so typically, a simplistic perspective
link |
00:33:10.520
is that one gene is equal one protein product, right?
link |
00:33:16.080
So you have a gene, you know, you transcribe it
link |
00:33:19.800
and translate it and it becomes a protein.
link |
00:33:24.640
In reality, when we talk about eukaryotes,
link |
00:33:28.400
especially sort of more recent eukaryotes
link |
00:33:32.360
that are very complex, the gene is no longer equal
link |
00:33:37.360
to one protein, it actually can produce multiple
link |
00:33:46.200
functionally active protein products.
link |
00:33:51.240
And each of them is called an alternatively spliced product.
link |
00:33:57.960
The reason it happens is that if you look at the gene,
link |
00:34:01.880
it actually has, it has also blocks.
link |
00:34:05.600
And the blocks, some of which,
link |
00:34:08.320
and it's essentially, it goes like this.
link |
00:34:10.680
So we have a block that will later be translated,
link |
00:34:13.880
we call it exon, then we'll have a block
link |
00:34:16.480
that is not translated, cut out, we call it intron.
link |
00:34:20.400
So we have exon, intron, exon, intron, et cetera, et cetera,
link |
00:34:23.560
et cetera, right?
link |
00:34:24.400
So sometimes you can have, you know,
link |
00:34:26.920
dozens of these exons and introns.
link |
00:34:29.880
So what happens is during the process
link |
00:34:32.680
when the gene is converted to RNA,
link |
00:34:37.320
we have things that are cut out,
link |
00:34:41.280
the introns that cut out, and exons that now
link |
00:34:45.080
get assembled together.
link |
00:34:47.160
And sometimes we will throw out some of the exons.
link |
00:34:52.320
And the remaining protein product will become
link |
00:34:54.640
still be the same, different, right?
link |
00:34:56.800
So now you have fragments of the protein
link |
00:35:00.000
that no longer there.
link |
00:35:01.360
They were cut out with the introns.
link |
00:35:03.800
Sometimes you will essentially take one exon
link |
00:35:07.520
and replace it with another one, right?
link |
00:35:09.840
So there's some flexibility in this process.
link |
00:35:12.600
So that creates a whole new level of complexity.
link |
00:35:17.200
Cause now...
link |
00:35:18.040
Is this random though?
link |
00:35:18.880
Is it random?
link |
00:35:19.720
It's not random.
link |
00:35:20.840
We, and this is where I think now the appearance
link |
00:35:24.480
of this modern single cell before that tissue level
link |
00:35:29.480
sequencing, next generation sequencing technique,
link |
00:35:31.880
such as RNA seed allows us to see that these are the events
link |
00:35:37.200
that often happen in response in, it's a dynamic event
link |
00:35:42.280
that happens in response to disease
link |
00:35:45.960
or in response to certain developmental stage of a cell.
link |
00:35:51.120
And this is an incredibly complex layer
link |
00:35:55.960
that also undergoes, I mean, because it's at the gene level,
link |
00:36:01.360
right?
link |
00:36:02.200
So it undergoes certain evolution, right?
link |
00:36:05.400
And now we have this interplay between what is happening
link |
00:36:10.800
in the protein world and what is happening
link |
00:36:14.080
in the gene and RNA world.
link |
00:36:18.040
And for example, it's often that we see that
link |
00:36:23.040
the boundaries of these exons coincide
link |
00:36:28.200
with the boundaries of the protein domains, right?
link |
00:36:32.160
So there is, you know, close interplay to that.
link |
00:36:36.520
It's not always, I mean, you know,
link |
00:36:37.960
otherwise it would be too simple, right?
link |
00:36:39.840
But we do see the connection
link |
00:36:41.880
between those sort of machineries.
link |
00:36:45.000
And obviously the evolution will pick up this complexity
link |
00:36:49.760
and, you know, select for whatever is successful,
link |
00:36:53.520
whatever is interesting function.
link |
00:36:55.080
We see that complexity in play
link |
00:36:57.560
and makes this question, you know, more complex
link |
00:37:01.200
but more exciting.
link |
00:37:02.280
As a small detour, I don't know if you think about this
link |
00:37:05.240
into the world of computer science.
link |
00:37:07.560
There's a Douglas Hustader, I think came up with a name
link |
00:37:12.920
of Quine, which are, I don't know if you're familiar
link |
00:37:16.200
with these things, but it's computer programs
link |
00:37:18.880
that have, I guess, exon and intron
link |
00:37:22.160
and they copy it.
link |
00:37:23.320
The whole purpose of the program is to copy itself.
link |
00:37:26.240
So it prints copies of itself
link |
00:37:28.480
but can also carry information inside of it.
link |
00:37:31.000
So it's a very kind of crude, fun exercise of,
link |
00:37:36.440
can we sort of replicate these ideas from cells?
link |
00:37:39.880
Or can we have a computer program
link |
00:37:41.480
that when you run it just prints itself,
link |
00:37:45.240
the entirety of itself
link |
00:37:47.080
and does it in different programming languages and so on.
link |
00:37:50.000
I've been playing around and writing them.
link |
00:37:51.920
It's a kind of fun little exercise.
link |
00:37:53.680
You know, when I was a kid, so, you know,
link |
00:37:55.680
it was essentially one of the sort of main stages
link |
00:38:02.840
in Informatics Olympiad
link |
00:38:06.760
that you have to reach in order to be any so good
link |
00:38:10.840
is you should be able to write a program
link |
00:38:14.360
that replicates itself.
link |
00:38:16.680
And so the tags then becomes even, you know,
link |
00:38:19.800
sort of more complicated.
link |
00:38:20.920
So what is the shortest?
link |
00:38:22.600
What is the shortest, yeah.
link |
00:38:24.040
And of course, it's, you know,
link |
00:38:25.400
it's a function of a programming language.
link |
00:38:27.480
But yeah, I remember, you know,
link |
00:38:29.280
long, long, long time ago when we tried to, you know,
link |
00:38:33.200
to make it short and short and find the shortcuts.
link |
00:38:36.560
There's actually on Stack Exchange,
link |
00:38:38.640
there's an entire site called CodeGolf, I think,
link |
00:38:44.120
where the entirety is just a competition.
link |
00:38:46.520
People just come up with whatever task, I don't know,
link |
00:38:50.360
like a write code that reports the weather today.
link |
00:38:54.640
And the competition is about whatever programming language,
link |
00:38:58.640
what is the shortest program?
link |
00:39:00.400
And it makes you actually, people should check it out
link |
00:39:02.240
because it makes you realize there's some weird
link |
00:39:04.920
programming languages out there.
link |
00:39:07.160
But, you know, just to dig on that a little deeper,
link |
00:39:12.720
do you think, you know, in computer science,
link |
00:39:16.080
you don't often think about programs.
link |
00:39:19.280
There's like the machine learning world now
link |
00:39:22.280
that's still kind of basic programs.
link |
00:39:26.280
And then there's humans that replicate themselves, right?
link |
00:39:29.600
And there's these mutations and so on.
link |
00:39:32.800
Do you think we'll ever have a world
link |
00:39:34.480
where there's programs that kind of have
link |
00:39:38.600
an evolutionary process?
link |
00:39:40.640
So I'm not talking about evolutionary algorithms,
link |
00:39:42.640
but I'm talking about programs that kind of
link |
00:39:44.640
mate with each other and evolve and,
link |
00:39:47.200
like, on their own, replicate themselves.
link |
00:39:49.600
So this is kind of, the idea here is, you know,
link |
00:39:54.640
that's how you can have a runaway thing.
link |
00:39:57.120
So we think about machine learning as a system
link |
00:39:59.240
that gets smarter and smarter and smarter and smarter.
link |
00:40:01.440
At least the machine learning systems of today
link |
00:40:04.280
are like, it's a program that you can, like, turn off.
link |
00:40:09.120
As opposed to throwing a bunch of little programs out there
link |
00:40:12.800
and letting them, like, multiply and mate
link |
00:40:15.720
and evolve and replicate.
link |
00:40:17.480
Do you ever think about that kind of world, you know,
link |
00:40:20.560
when we jump from the biological systems
link |
00:40:23.480
that you're looking at to artificial ones?
link |
00:40:27.280
I mean, it's almost like you take the sort of
link |
00:40:31.920
the area of intelligent agents, right?
link |
00:40:34.480
Which are essentially the independent sort of codes
link |
00:40:38.720
that run and interact and exchange the information, right?
link |
00:40:42.560
So I don't see why not.
link |
00:40:45.200
I mean, I, you know, it could be sort of a natural evolution
link |
00:40:48.840
in this area of computer science.
link |
00:40:53.000
I think it's kind of an interesting possibility.
link |
00:40:54.720
It's terrifying, too.
link |
00:40:55.960
But I think it's a really powerful tool.
link |
00:40:58.360
Like, to have agents that, you know,
link |
00:41:00.680
we have social networks with millions of people
link |
00:41:02.800
and they interact.
link |
00:41:03.840
I think it's interesting to inject into that.
link |
00:41:05.720
It was already injected into that bot, right?
link |
00:41:08.400
But those bots are pretty dumb.
link |
00:41:10.040
You know, they're probably pretty dumb algorithms.
link |
00:41:15.680
You know, it's interesting to think that there might be bots
link |
00:41:18.640
that evolve together with humans.
link |
00:41:20.440
And there's the sea of humans and robots
link |
00:41:23.960
that are operating first in the digital space.
link |
00:41:26.520
And then you can also think, I love the idea.
link |
00:41:29.080
Some people worked, I think at Harvard, at Penn,
link |
00:41:32.600
there's robotics labs that, you know,
link |
00:41:36.760
take as a fundamental task to build a robot
link |
00:41:40.600
that, given extra resources,
link |
00:41:42.400
can build another copy of itself,
link |
00:41:44.920
like in the physical space,
link |
00:41:46.560
which is super difficult to do,
link |
00:41:49.400
but super interesting.
link |
00:41:50.920
I remember there's like research on robots
link |
00:41:54.040
that can build a bridge.
link |
00:41:55.240
So they make a copy of themselves
link |
00:41:56.880
and they connect themselves.
link |
00:41:57.960
And so it's like self building bridge
link |
00:42:00.600
based on building blocks.
link |
00:42:02.400
You can imagine like a building that self assembles.
link |
00:42:05.640
So it's basically self assembling structures
link |
00:42:07.600
from robotic parts,
link |
00:42:10.640
but it's interesting to, within that robot,
link |
00:42:13.880
add the ability to mutate and do all the interesting,
link |
00:42:19.840
like little things that you're referring to in evolution
link |
00:42:23.200
to go from a single origin protein building block
link |
00:42:26.320
to like this weird complexity.
link |
00:42:29.000
And if you think about this, I mean, you know,
link |
00:42:30.960
the bits and pieces are there, you know?
link |
00:42:34.600
So you mentioned the evolutionary algorithm, right?
link |
00:42:37.040
You know, so this is sort of,
link |
00:42:38.520
and maybe sort of the goal is in a way different, right?
link |
00:42:44.800
So the goal is to, you know, to essentially,
link |
00:42:47.680
to optimize your search, right?
link |
00:42:50.080
So, but sort of the ideas are there.
link |
00:42:53.040
So people recognize that, you know,
link |
00:42:55.080
that the recombination events
link |
00:42:59.120
lead to global changes in the search trajectories,
link |
00:43:03.280
the mutations event is a more refined step in the search.
link |
00:43:10.040
Then you have, you know,
link |
00:43:12.200
other sort of nature inspired algorithm, right?
link |
00:43:16.520
So it's one of the reasons that, you know,
link |
00:43:19.600
I think it's one of the funnest one,
link |
00:43:22.000
is the slime based algorithm, right?
link |
00:43:24.880
So I think the first was introduced by the Japanese group
link |
00:43:29.880
where it was able to solve some pre, you know,
link |
00:43:34.720
complex problems.
link |
00:43:36.920
So that's, you know, and then I think there are still
link |
00:43:42.320
a lot of things we've yet to, you know,
link |
00:43:46.520
borrow from the nature, right?
link |
00:43:49.000
So there are a lot of sort of ideas that nature,
link |
00:43:54.760
you know, gets to offer us that, you know,
link |
00:43:56.800
it's up to us to grab it and to, you know,
link |
00:44:01.080
get the best use of it.
link |
00:44:02.200
Including neural networks.
link |
00:44:03.680
You know, we have a very crude inspiration
link |
00:44:07.080
from nature on neural networks.
link |
00:44:08.320
Maybe there's other inspirations to be discovered
link |
00:44:10.960
in the brain or other aspects of the various systems,
link |
00:44:16.320
even like the immune system, the way it interplays.
link |
00:44:20.200
I recently started to understand that the immune system
link |
00:44:23.600
has something to do with the way the brain operates.
link |
00:44:26.080
Like there's multiple things going on in there,
link |
00:44:28.400
which all of which are not modeled
link |
00:44:30.560
in artificial neural networks.
link |
00:44:32.160
And maybe if you throw a little bit
link |
00:44:34.000
of that biological spice in theirs,
link |
00:44:36.240
you'll come up with something, something cool.
link |
00:44:39.040
I'm not sure if you're familiar with the Drake equation.
link |
00:44:43.800
That estimate, I just did a video on it yesterday
link |
00:44:46.760
because I wanted to give my own estimate of it.
link |
00:44:49.320
It's an equation that combines a bunch of factors
link |
00:44:52.400
to estimate how many alien civilizations
link |
00:44:55.840
are in the galaxy.
link |
00:44:57.040
I've heard about it, yes.
link |
00:44:58.520
So one of the interesting parameters,
link |
00:45:01.160
you know, it's like how many stars are born every year,
link |
00:45:06.000
how many planets are on average per star,
link |
00:45:10.720
for this, how many habitable planets are there.
link |
00:45:14.280
And then the one that starts being really interesting
link |
00:45:18.600
is the probability that life emerges on a habitable planet.
link |
00:45:23.600
So like, I don't know if you think about,
link |
00:45:26.640
you certainly think a lot about evolution,
link |
00:45:28.560
but do you think about the thing
link |
00:45:30.000
which evolution doesn't describe,
link |
00:45:31.400
which is like the beginning of evolution,
link |
00:45:33.840
the origin of life.
link |
00:45:35.520
I think I put the probability of life developing
link |
00:45:38.320
a habitable planet at 1%.
link |
00:45:40.600
This is very scientifically rigorous.
link |
00:45:43.360
Okay, well, first at a high level for the Drake equation,
link |
00:45:47.360
what would you put that percent that on Earth?
link |
00:45:50.560
And in general, do you have something,
link |
00:45:53.800
do you have thoughts about how life might have started?
link |
00:45:57.240
You know, like the proteins being the first kind of,
link |
00:46:00.200
one of the early jumping points?
link |
00:46:02.040
Yes, also, I think back in 2018,
link |
00:46:06.480
there was a very exciting paper published in Nature
link |
00:46:09.480
where they found one of the simplest amino acids, glycine.
link |
00:46:14.480
One of the simplest amino acids, glycine,
link |
00:46:20.040
in a comet dust.
link |
00:46:23.520
So this is, I apologize if I don't pronounce,
link |
00:46:29.440
it's a Russian named Comets.
link |
00:46:32.080
I think Chugryumov Gerasimenko.
link |
00:46:34.840
This is the comet where, and there was this mission
link |
00:46:40.000
to get close to this comet
link |
00:46:44.000
and get the stardust from its tail.
link |
00:46:48.160
And when scientists analyzed it,
link |
00:46:50.640
they actually found traces of glycine,
link |
00:46:56.640
which makes up, it's one of the 20 basic amino acids
link |
00:47:04.200
that makes up proteins, right?
link |
00:47:06.400
So that was kind of very exciting, right?
link |
00:47:10.960
But the question is very interesting, right?
link |
00:47:14.200
So what, if there is some alien life,
link |
00:47:19.880
is it gonna be made of proteins, right?
link |
00:47:22.920
Or maybe RNAs, right?
link |
00:47:24.320
So we see that the RNA viruses
link |
00:47:28.120
are certainly very well established
link |
00:47:32.320
sort of group of molecular machines, right?
link |
00:47:37.320
So yes, it's a very interesting question.
link |
00:47:41.760
What probability would you put?
link |
00:47:43.600
Like, how unlikely just on earth do you think
link |
00:47:48.760
this whole thing is that we got going?
link |
00:47:51.600
Like, are we really lucky or is it inevitable?
link |
00:47:54.640
Like, what's your sense when you sit back
link |
00:47:56.240
and think about life on earth?
link |
00:47:58.840
Is it higher or lower than 1%?
link |
00:48:01.000
Well, because 1% is pretty low,
link |
00:48:02.320
but it still is like, damn, that's a pretty good chance.
link |
00:48:05.040
Yes, it's a pretty good chance.
link |
00:48:06.600
I mean, I would personally, but again,
link |
00:48:10.560
I'm probably not the best person
link |
00:48:14.160
to do such estimations,
link |
00:48:16.640
but intuitively, I would probably put it lower.
link |
00:48:23.080
But still, I mean, you know, give up.
link |
00:48:24.960
We're really lucky here on earth.
link |
00:48:28.560
Or the conditions are really good.
link |
00:48:30.480
I mean, I think that there was everything was right
link |
00:48:34.480
in a way, right?
link |
00:48:35.480
So we still, the conditions were not like ideal
link |
00:48:39.720
if you try to look at what was, you know,
link |
00:48:44.040
several billions years ago when the life emerged.
link |
00:48:48.320
So there is something called the rare earth hypothesis
link |
00:48:52.040
that, you know, in counter to the Drake equation says
link |
00:48:55.720
that the, you know, the conditions of earth,
link |
00:49:00.240
if you actually were to describe earth,
link |
00:49:03.280
it's quite a special place.
link |
00:49:05.720
So special, it might be unique in our galaxy
link |
00:49:09.120
and potentially, you know, close to unique
link |
00:49:11.760
in the entire universe.
link |
00:49:12.880
Like it's very difficult to reconstruct
link |
00:49:14.720
those same conditions.
link |
00:49:16.320
And what the rare earth hypothesis argues
link |
00:49:19.520
is all those different conditions are essential for life.
link |
00:49:23.040
And so that's sort of the counter, you know,
link |
00:49:26.120
like all the things we, you know,
link |
00:49:29.200
thinking that earth is pretty average.
link |
00:49:31.520
I mean, I can't really, I'm trying to remember
link |
00:49:34.080
to go through all of them, but just the fact
link |
00:49:36.160
that it is shielded from a lot of asteroids,
link |
00:49:41.920
obviously the distance to the sun,
link |
00:49:43.840
but also the fact that it's like a perfect balance
link |
00:49:48.280
between the amount of water and land
link |
00:49:52.200
and all those kinds of things.
link |
00:49:53.680
I don't know, there's a bunch of different factors
link |
00:49:55.200
that I don't remember.
link |
00:49:56.320
There's a long list, but it's fascinating to think about
link |
00:49:58.720
if in order for something like proteins
link |
00:50:03.640
and then the DNA and RNA to merge,
link |
00:50:06.600
you need, and basic living organisms,
link |
00:50:10.520
you need to be a very close and earth like planet,
link |
00:50:15.000
which would be sad or exciting.
link |
00:50:18.760
I don't know.
link |
00:50:19.760
If you ask me, I, you know,
link |
00:50:21.960
in a way I put a parallel between, you know,
link |
00:50:25.560
between our own research and, I mean,
link |
00:50:30.200
from the intuitive perspective, you know,
link |
00:50:34.080
you have those two extremes
link |
00:50:36.760
and the reality is never very rarely falls into the extremes.
link |
00:50:41.960
It's always the optimums always reached somewhere
link |
00:50:45.240
in between.
link |
00:50:46.560
So I would, and that's what I tend to think.
link |
00:50:50.080
I think that, you know, we're probably somewhere in between.
link |
00:50:54.080
So they were not unique, unique,
link |
00:50:57.000
but again, the chances are, you know, reasonably small.
link |
00:51:02.000
The problem is we don't know the other extreme ways.
link |
00:51:04.400
Like I tend to think that we don't actually understand
link |
00:51:08.080
the basic mechanisms of like what this is all originated from.
link |
00:51:12.920
Like it seems like we think of life as this distinct thing,
link |
00:51:16.200
maybe intelligence is a distinct thing,
link |
00:51:18.560
maybe the physics that from which planets and suns are born
link |
00:51:23.160
is a distinct thing, but that could be a very,
link |
00:51:25.960
it's like the Stephen Wolfram thing.
link |
00:51:27.480
It's like the, from simple rules,
link |
00:51:29.120
emerges greater and greater complexity.
link |
00:51:31.040
So, you know, I tend to believe that just life finds a way.
link |
00:51:35.800
It, like, we don't know the extreme of how common life is
link |
00:51:39.600
because it could be life is like everywhere.
link |
00:51:45.000
Like, like so everywhere that it's almost like laughable.
link |
00:51:49.440
Like that we're such idiots to think,
link |
00:51:51.520
or you, like it's like ridiculous to even like think.
link |
00:51:56.280
It's like ants thinking that their little colony
link |
00:51:59.440
is the unique thing and everything else doesn't exist.
link |
00:52:03.200
I mean, it's also very possible that that's the extreme.
link |
00:52:07.520
And we're just not able to maybe comprehend
link |
00:52:09.880
the nature of that life.
link |
00:52:11.960
I mean, just to stick on alien life
link |
00:52:14.440
for just a brief moment more,
link |
00:52:16.240
is there is some signs of life on Venus in gaseous form.
link |
00:52:22.240
There's hope for life on Mars, probably extinct.
link |
00:52:28.120
We're not talking about intelligent life.
link |
00:52:30.840
Although that has been in the news recently.
link |
00:52:33.800
We're talking about basic, like, you know, bacteria.
link |
00:52:37.000
A lot of bacteria.
link |
00:52:37.840
Yeah.
link |
00:52:38.680
And then also, I guess, there's a couple moons that I guess.
link |
00:52:42.200
Europe.
link |
00:52:43.040
Yeah, Europa, which is Jupiter's moon.
link |
00:52:46.040
I think there's another one.
link |
00:52:47.480
Are you, is that exciting?
link |
00:52:50.120
Or is it terrifying to you that we might find life?
link |
00:52:53.040
Do you hope we find life?
link |
00:52:54.520
I certainly do hope that we find life.
link |
00:52:57.960
I mean, it was very exciting to hear about, you know,
link |
00:53:02.760
this news about the possible life on Venus.
link |
00:53:09.280
It's been nice to have hard evidence of something with,
link |
00:53:12.560
which is what the hope is for Mars.
link |
00:53:15.400
And Europa, but do you think those organisms
link |
00:53:18.440
would be similar biologically?
link |
00:53:20.800
Or would they even be sort of carbon based?
link |
00:53:23.960
If we do find them?
link |
00:53:25.760
I would say they would be carbon based.
link |
00:53:28.920
How similar?
link |
00:53:30.320
It's a big question, right?
link |
00:53:31.840
So it's the moment we discover things outside Earth, right?
link |
00:53:39.560
Even if it's a tiny little single cell.
link |
00:53:42.400
I mean, there is so much.
link |
00:53:45.400
Just imagine that.
link |
00:53:46.360
That would be so.
link |
00:53:47.720
I think that that would be another turning point
link |
00:53:50.720
for the science, you know?
link |
00:53:52.040
And if, especially if it's different
link |
00:53:54.160
in some very new way, that's exciting.
link |
00:53:57.040
Cause that says, that's a definitive state,
link |
00:53:59.720
not a definitive, but a pretty strong statement
link |
00:54:01.800
that life is everywhere in the universe.
link |
00:54:05.440
To me, at least that's really exciting.
link |
00:54:07.740
You brought up Joshua Letterberg
link |
00:54:11.540
in an offline conversation.
link |
00:54:13.500
I think I'd love to talk to you about AlphaFold.
link |
00:54:15.820
And this might be an interesting way
link |
00:54:17.220
to enter that conversation because,
link |
00:54:20.380
so he won the 1958 Nobel Prize in Physiology and Medicine
link |
00:54:24.500
for discovering that bacteria can mate and exchange genes.
link |
00:54:29.020
But he also did a ton of other stuff,
link |
00:54:32.220
like we mentioned, helping NASA find life on Mars,
link |
00:54:37.780
and Dendrol, the chemical expert system.
link |
00:54:45.260
Expert systems, remember those?
link |
00:54:47.900
Do you, what do you find interesting about this guy
link |
00:54:51.380
and his ideas about artificial intelligence in general?
link |
00:54:54.980
So I have a kind of personal story to share.
link |
00:54:59.980
So I started my PhD in Canada back in 2000.
link |
00:55:05.180
And so essentially my PhD was,
link |
00:55:07.780
so we were developing sort of a new language
link |
00:55:10.140
for symbolic machine learning.
link |
00:55:12.580
So it's different from the feature based machine learning.
link |
00:55:15.140
And one of the sort of cleanest applications
link |
00:55:19.860
of this approach, of this formalism,
link |
00:55:24.020
was two cheminformatics and computer aided drug design.
link |
00:55:28.660
So essentially, as a part of my research,
link |
00:55:33.820
I developed a system that essentially looked
link |
00:55:37.420
at chemical compounds of, say,
link |
00:55:40.860
the same therapeutic category, male hormones, right?
link |
00:55:46.380
And tried to figure out the structural fragments
link |
00:55:52.500
that are the structural building blocks
link |
00:55:55.180
that are important that define this class
link |
00:55:58.860
versus structural building blocks
link |
00:56:00.500
that are there just because, to complete the structure.
link |
00:56:04.980
But they are not essentially the ones
link |
00:56:06.740
that make up the chemical,
link |
00:56:08.820
the key chemical properties of this therapeutic category.
link |
00:56:13.500
And for me, it was something new.
link |
00:56:17.540
I was trained as an applied mathematician
link |
00:56:20.660
as with some machine learning background,
link |
00:56:23.700
but computer aided drug design
link |
00:56:25.820
was completely new territory.
link |
00:56:28.380
So because of that, I often find myself asking
link |
00:56:32.620
lots of questions on one of these sort of central forums.
link |
00:56:37.660
Back then, there were no Facebooks or stuff like that.
link |
00:56:41.060
There was a forum.
link |
00:56:42.380
It's a forum, it's essentially, it's like a bulletin board.
link |
00:56:46.500
On the internet.
link |
00:56:48.100
Yeah, so essentially you have a bunch of people
link |
00:56:50.980
and you post the question
link |
00:56:52.460
and you get an answer from different people.
link |
00:56:55.860
And back then, one of the most popular forums was CCL.
link |
00:57:01.300
Think Computational Chemistry Library,
link |
00:57:05.420
not library, but something like that.
link |
00:57:07.060
But CCL, that was the forum.
link |
00:57:09.820
And there I asked a lot of dumb questions.
link |
00:57:14.020
Yes, I asked questions.
link |
00:57:15.460
I also shared some information about our formalism
link |
00:57:20.460
and how we do and whether whatever we do makes sense.
link |
00:57:25.100
And so, I remember that one of these posts,
link |
00:57:29.180
I mean, I still remember I would call it desperately
link |
00:57:34.900
looking for a chemist advice, something like that.
link |
00:57:40.740
And so I posed my question.
link |
00:57:42.500
I explained how our formalism is what it does
link |
00:57:47.500
and what kind of applications I'm planning to do.
link |
00:57:53.220
And it was in the middle of the night
link |
00:57:56.260
and I went back to bed.
link |
00:57:59.660
And next morning, have a phone call from my advisor
link |
00:58:04.820
who also looked at this forum.
link |
00:58:06.940
It's like, you won't believe who replied to you.
link |
00:58:11.060
And it's like, who?
link |
00:58:13.940
He said, well, there is a message to you
link |
00:58:16.540
from Joshua Lederberg.
link |
00:58:19.140
And my reaction was like, who is Joshua Lederberg?
link |
00:58:22.580
And your advisor hung up.
link |
00:58:27.460
So essentially, Joshua wrote me that we had conceptually
link |
00:58:33.100
similar ideas in the Dendral project.
link |
00:58:36.660
You may wanna look it up.
link |
00:58:39.340
And we should also, sorry, and it's a side comment say
link |
00:58:42.860
that even though he won the Nobel Prize at a really young age
link |
00:58:47.220
in 58, but so.
link |
00:58:49.380
He was, I think he was what, 33.
link |
00:58:52.380
Yeah, it's just crazy.
link |
00:58:53.980
So anyway, so that's, so hence in the 90s,
link |
00:58:57.660
responding to young whippersnappers on the CCL forum.
link |
00:59:02.100
Okay.
link |
00:59:02.940
And so back then he was already very senior.
link |
00:59:05.860
I mean, he unfortunately passed away back in 2008.
link |
00:59:09.620
But back in 2001, he was a professor emeritus
link |
00:59:14.180
at Rockefeller University.
link |
00:59:15.980
And that was actually, believe it or not,
link |
00:59:18.500
one of the reasons I decided to join as a postdoc,
link |
00:59:26.780
the group of Andrei Saleh, who was at Rockefeller University
link |
00:59:30.820
with the hope that I could actually have a chance
link |
00:59:36.020
to meet Joshua in person.
link |
00:59:37.980
And I met him very briefly, right?
link |
00:59:40.740
The, just because he was walking,
link |
00:59:44.740
you know, there's a little bridge that connects the sort
link |
00:59:48.100
of the research campus with the sort of sky scrappers
link |
00:59:54.260
that Rockefeller owns.
link |
00:59:55.500
There were postdocs and faculty and graduate students live.
link |
01:00:00.500
And so, so I met him, you know,
link |
01:00:02.460
and I had a very short conversation, you know.
link |
01:00:06.340
But so I started, you know, reading about Dendral
link |
01:00:10.380
and I was amazed, you know, it's,
link |
01:00:12.660
we're talking about 1960, right?
link |
01:00:16.100
The ideas were so profound.
link |
01:00:19.260
Well, what's the fundamental ideas of it?
link |
01:00:21.100
The reason to make this is even crazier.
link |
01:00:25.020
So, so, so Lederberg wanted to make a system
link |
01:00:29.860
that would help him study the extraterrestrial molecules, right?
link |
01:00:38.740
So, so the idea was that, you know,
link |
01:00:40.940
the way you study the extraterrestrial molecules
link |
01:00:43.380
is you do the mass spec analysis, right?
link |
01:00:46.780
And so the mass spec gives you sort of bits,
link |
01:00:49.700
numbers about essentially gives you the ideas
link |
01:00:52.620
about the possible fragments or, you know, atoms,
link |
01:00:56.620
and, you know, and maybe a little fragments,
link |
01:00:59.820
pieces of this molecule that make up the molecule, right?
link |
01:01:03.620
So now you need to sort of to decompose this information
link |
01:01:09.220
and to figure out what was the whole
link |
01:01:12.460
before, you know, it became fragments, bits and pieces, right?
link |
01:01:17.660
So, so in order to make this, you know, to have this tool,
link |
01:01:22.660
the idea of Lederberg was to connect chemistry,
link |
01:01:29.780
computer science, and to design this so called expert system
link |
01:01:36.100
that looks, that takes into account,
link |
01:01:38.180
that takes as an input the mass spec data,
link |
01:01:42.180
the possible, the database of possible molecules,
link |
01:01:47.980
and essentially try to sort of induce
link |
01:01:51.860
the molecule that would correspond to this spectra,
link |
01:01:55.580
or, you know, essentially what this project ended up being
link |
01:02:03.020
was that, you know, it would provide a list of candidates
link |
01:02:07.060
that then a chemist would look at and make final decision.
link |
01:02:11.940
So.
link |
01:02:12.780
But the original idea is supposed to solve the entirety
link |
01:02:15.060
of this problem automatically.
link |
01:02:16.820
Yes, yes.
link |
01:02:17.660
No, so he, you know, so he, back then he approached,
link |
01:02:25.180
yes, believe that, you know, it's amazing.
link |
01:02:28.940
I mean, it still blows my mind, you know, that it's,
link |
01:02:32.220
that's, and this was essentially the origin
link |
01:02:37.380
of the modern bioinformatics, cheminformatics,
link |
01:02:41.140
you know, back in the 60s.
link |
01:02:42.780
So that's, you know, so every time you deal
link |
01:02:47.100
with projects like this, with the, you know,
link |
01:02:50.020
research like this, you just, you know,
link |
01:02:52.580
so the power of the, you know, intelligence
link |
01:02:57.020
of these people is just, you know, overwhelming.
link |
01:03:01.740
Do you think about expert systems?
link |
01:03:04.180
Is there, and why they kind of didn't become successful,
link |
01:03:10.380
especially in the space of bioinformatics,
link |
01:03:12.500
where it does seem like there is a lot of expertise
link |
01:03:15.380
in humans and, you know, it's possible to see
link |
01:03:20.100
that a system like this could be made very useful.
link |
01:03:23.620
Right.
link |
01:03:24.460
So it's actually, it's a great question.
link |
01:03:26.940
And this is something so, you know, so, you know,
link |
01:03:30.500
at my university, I teach artificial intelligence
link |
01:03:33.940
and, you know, we start the, my first two lectures
link |
01:03:38.020
are on the history of AI.
link |
01:03:40.180
And there we, you know, we try to, you know,
link |
01:03:44.420
go through the main stages of AI.
link |
01:03:46.980
And so, you know, the question of why expert systems failed
link |
01:03:53.100
or became obsolete is actually a very interesting one.
link |
01:03:57.300
And there are, you know, if you try to read the, you know,
link |
01:04:00.700
the historical perspectives, there are actually two lines
link |
01:04:03.540
of thoughts.
link |
01:04:04.420
One is that they were essentially not up to the expectations
link |
01:04:09.420
and so therefore they were replaced, you know,
link |
01:04:12.460
by, by other things.
link |
01:04:14.860
Right.
link |
01:04:15.700
The other one was that completely opposite one
link |
01:04:20.180
that they were too good.
link |
01:04:22.580
And as a result, they essentially became sort of
link |
01:04:26.780
a household name and then essentially they got transformed.
link |
01:04:31.820
I mean, they, in both cases,
link |
01:04:33.860
sort of they were replaced by other things, right.
link |
01:04:37.140
I mean, they, in both cases, sort of the outcome
link |
01:04:40.020
was the same, they evolved into something.
link |
01:04:42.380
Yeah.
link |
01:04:43.300
Right.
link |
01:04:44.140
And that's what I, you know, if, if I look at this, right.
link |
01:04:47.700
So the modern machine learning, right.
link |
01:04:50.180
So.
link |
01:04:51.020
So there's echoes in the modern machine learning.
link |
01:04:53.220
I think so.
link |
01:04:54.140
I think so.
link |
01:04:54.980
Because, you know, if, if you think about this, you know,
link |
01:04:57.180
and how we design, you know, the most successful algorithms
link |
01:05:02.500
including alpha fold, right.
link |
01:05:04.100
You built in the knowledge about the domain that you study.
link |
01:05:09.660
Right.
link |
01:05:10.500
So, so you built in your expertise.
link |
01:05:12.860
So speaking of alpha fold,
link |
01:05:14.420
so DeepMind's alpha fold two recently was announced to have
link |
01:05:18.900
quote unquote, solved protein folding.
link |
01:05:22.500
How exciting is this to you?
link |
01:05:24.180
It seems to be one of the, one of the exciting things
link |
01:05:28.260
that have happened in 2020.
link |
01:05:29.620
It's incredible accomplishment from the looks of it.
link |
01:05:32.300
What part of it is amazing to you?
link |
01:05:33.820
What part would you say is overhyped or maybe misunderstood?
link |
01:05:38.980
It's definitely a very exciting achievement
link |
01:05:41.860
to give you a little bit of perspective, right.
link |
01:05:43.740
So, so in bioinformatics, we have several competitions.
link |
01:05:49.940
And so the way, you know, you often hear how those competitions
link |
01:05:55.340
have been explained to sort of to non bioinformaticians
link |
01:05:59.260
as they, you know, they call it bioinformatics Olympic games.
link |
01:06:01.900
And there are several disciplines, right.
link |
01:06:03.620
So, so the, the, the historical one of the first one
link |
01:06:07.060
was the discipline in predicting the protein structure,
link |
01:06:10.340
predicting the 3D coordinates of the protein,
link |
01:06:12.620
but there are some others.
link |
01:06:13.620
So the predicting protein functions,
link |
01:06:16.780
predicting effects of mutations on protein functions,
link |
01:06:21.500
then predicting a protein, protein interactions.
link |
01:06:24.940
So, so the original one was a CASP
link |
01:06:28.140
or a critical assessment of, of protein structure.
link |
01:06:31.540
And the, you know, typically what happens
link |
01:06:40.020
during these competitions is, you know, scientists,
link |
01:06:43.980
experimental scientists solve the, the structures,
link |
01:06:48.380
but don't put them into the protein data bank,
link |
01:06:51.700
which is the centralized database
link |
01:06:54.700
that contains all the 3D coordinates.
link |
01:06:57.260
Instead, they hold it and release protein sequences.
link |
01:07:02.300
And now the challenge of the community
link |
01:07:05.420
is to predict the 3D structures of these proteins
link |
01:07:10.140
and then use the experimentary solve structures
link |
01:07:12.940
to assess which one is the closest one, right.
link |
01:07:16.620
And this competition, by the way,
link |
01:07:17.740
just a bunch of different tangents.
link |
01:07:19.500
So maybe you can also say what is protein folding
link |
01:07:22.860
then this competition CASP competition is,
link |
01:07:25.740
has become the gold standard
link |
01:07:27.420
and that's what was used to say
link |
01:07:29.500
that protein folding was solved.
link |
01:07:32.420
So I just added a little, just a bunch.
link |
01:07:35.300
So if you can, whenever you say stuff,
link |
01:07:37.660
maybe throw in some of the basics for the folks
link |
01:07:39.940
that might be outside of the field.
link |
01:07:41.540
Anyway, sorry.
link |
01:07:42.380
So, yeah, so, you know, so the reason it's, you know,
link |
01:07:45.900
it's relevant to our understanding of protein folding
link |
01:07:50.260
is because, you know, we, we've yet to learn
link |
01:07:54.140
how the folding mechanistically works, right.
link |
01:07:58.140
So there are different hypotheses
link |
01:08:00.740
what happens to this fold.
link |
01:08:02.780
For example, there is a hypothesis
link |
01:08:05.940
that the folding happens by, you know,
link |
01:08:09.780
also in the modular fashion, right.
link |
01:08:12.660
So that, you know, we have protein domains
link |
01:08:16.220
that get folded independently
link |
01:08:17.940
because the structure is stable
link |
01:08:19.700
and then the whole protein structure gets formed.
link |
01:08:23.380
But, you know, within those domains,
link |
01:08:25.380
we also have so called secondary structure,
link |
01:08:27.460
the small alpha helices, beta sheets.
link |
01:08:29.820
So these are, you know, elements that are structurally stable.
link |
01:08:34.300
And so, and the question is, you know,
link |
01:08:37.780
when they, when do they get formed?
link |
01:08:40.300
Because some of the secondary structure elements,
link |
01:08:42.580
you have to have, you know, a fragment in the beginning
link |
01:08:46.460
and say the fragment in the middle, right.
link |
01:08:49.420
So you cannot potentially start having the full fold
link |
01:08:54.820
from the get go, right.
link |
01:08:57.140
So it's still, you know, it's still a big enigma.
link |
01:09:00.340
What happens, we know that it's an extremely efficient
link |
01:09:04.300
and stable process, right.
link |
01:09:05.700
So there's this long sequence
link |
01:09:07.700
and the fold happens really quickly.
link |
01:09:09.540
Exactly.
link |
01:09:10.380
So that's really weird, right.
link |
01:09:11.700
And it happens like the same way almost every time.
link |
01:09:15.260
Exactly, exactly.
link |
01:09:16.580
That's really weird.
link |
01:09:17.900
That's freaking weird.
link |
01:09:19.100
It's, yeah, that's why it's such an amazing thing.
link |
01:09:22.900
But most importantly, right, so it's, you know,
link |
01:09:24.980
so when you see the, you know, the translation process, right.
link |
01:09:29.260
So when you don't have the whole protein translated,
link |
01:09:36.100
right, it's still being translated, you know,
link |
01:09:39.220
getting out from the ribosome,
link |
01:09:41.180
you already see some structural, you know, fragmentation.
link |
01:09:45.780
So, so folding starts happening
link |
01:09:49.340
before the whole protein gets produced, right.
link |
01:09:53.100
And so this is, this is obviously, you know,
link |
01:09:55.100
one of the biggest questions in, you know,
link |
01:09:59.260
in modern molecular biologies.
link |
01:10:01.020
Not, not like maybe what happens.
link |
01:10:04.180
Like that's not, that's bigger than the question of folding.
link |
01:10:07.900
That's the question of like,
link |
01:10:09.580
so like deeper fundamental idea of folding.
link |
01:10:12.460
Yes.
link |
01:10:13.300
Behind folding.
link |
01:10:14.140
Exactly.
link |
01:10:14.980
You know, so obviously if we are able to predict
link |
01:10:21.300
the end product of protein folding,
link |
01:10:24.020
we are one step closer to understanding
link |
01:10:27.620
sort of the mechanisms of the protein folding.
link |
01:10:30.180
Because we can then potentially look and start probing
link |
01:10:34.660
what are the critical parts of this process
link |
01:10:38.220
and what are not so critical parts of this process.
link |
01:10:41.220
So we can start decomposing this, you know,
link |
01:10:44.380
so in a way this protein structure prediction algorithm
link |
01:10:50.100
can be used as a tool, right.
link |
01:10:53.740
So you change the, you know, you modify the protein,
link |
01:10:59.260
you get back to this tool, it predicts,
link |
01:11:02.420
okay, it's completely unstable.
link |
01:11:04.980
Yeah, which aspects of the input
link |
01:11:07.820
will have a big impact on the output.
link |
01:11:09.900
Exactly, exactly.
link |
01:11:11.180
So what happens is, you know, we typically have
link |
01:11:14.460
some sort of incremental advancement.
link |
01:11:18.700
You know, each stage of this CASP competition,
link |
01:11:22.580
you have groups with incremental advancement.
link |
01:11:25.340
And, you know, historically the top performing groups
link |
01:11:29.860
were, you know, they were not using machine learning.
link |
01:11:34.420
They were using very advanced biophysics
link |
01:11:37.700
combined with bioinformatics,
link |
01:11:39.580
combined with, you know, the data mining.
link |
01:11:43.220
And that was, you know, that would enable them
link |
01:11:47.340
to obtain protein structures of those proteins
link |
01:11:52.660
that don't have any structurally solved relatives.
link |
01:11:57.540
Because, you know, if we have another protein,
link |
01:12:01.860
say the same protein, but coming from a different species,
link |
01:12:05.620
we could potentially derive some ideas,
link |
01:12:10.500
and that's so called homology or comparative modeling,
link |
01:12:13.260
where we'll derive some ideas
link |
01:12:15.300
from the previously known structures.
link |
01:12:17.580
And that would help us tremendously in, you know,
link |
01:12:21.460
in reconstructing the 3D structure overall.
link |
01:12:25.420
But what happens when we don't have these relatives?
link |
01:12:27.940
This is when it becomes really, really hard, right?
link |
01:12:31.260
So that's so called the NOVO, you know,
link |
01:12:35.220
the NOVO protein structure prediction.
link |
01:12:37.620
And in this case, those methods were traditionally very good.
link |
01:12:43.060
But what happened in the last year,
link |
01:12:46.300
the original alpha fold came into,
link |
01:12:50.660
and over sudden it's much better than everyone else.
link |
01:12:56.420
This is 2018.
link |
01:12:57.900
Yeah.
link |
01:12:58.740
Oh, the competition is only every two years, I think.
link |
01:13:02.140
And then, so, you know, it was sort of
link |
01:13:06.580
kind of over shockwave to the bioinformatics community
link |
01:13:10.180
that, you know, we have like a state of the art
link |
01:13:13.340
machine learning system that does, you know,
link |
01:13:17.180
structure prediction.
link |
01:13:18.460
And essentially what it does, you know,
link |
01:13:20.780
so, you know, if you look at this,
link |
01:13:23.620
it actually predicts the context.
link |
01:13:26.140
So, you know, so the process of reconstructing
link |
01:13:29.460
the 3D structure starts by predicting the context
link |
01:13:34.700
between the different parts of the protein.
link |
01:13:38.860
And the context essentially is the parts of the proteins
link |
01:13:40.980
that are in a close proximity to each other.
link |
01:13:43.220
Right.
link |
01:13:44.060
So actually the machine learning part seems to be
link |
01:13:47.020
estimating, you can correct me if I'm wrong here,
link |
01:13:51.060
but it seems to be estimating the distance matrix,
link |
01:13:53.180
which is like the distance between the different parts.
link |
01:13:55.900
Yeah.
link |
01:13:56.740
So we call the contact map.
link |
01:13:58.060
Contact map.
link |
01:13:58.900
So once you have the contact map,
link |
01:14:00.580
the reconstruction is becoming more straightforward.
link |
01:14:03.900
Yeah.
link |
01:14:04.740
Right.
link |
01:14:05.580
But so the contact map is the key.
link |
01:14:06.780
And so, you know, so that's what happened.
link |
01:14:11.260
And now we started seeing in this current stage, right?
link |
01:14:15.980
Well, in the most recent one,
link |
01:14:18.500
we started seeing the emergence of these ideas
link |
01:14:22.020
in others people works, right?
link |
01:14:25.060
But yet here's, you know, Alpha Fold 2
link |
01:14:29.500
that again outperforms everyone else.
link |
01:14:33.380
And also by introducing yet another wave
link |
01:14:35.780
of the machine learning ideas.
link |
01:14:38.660
Yeah.
link |
01:14:39.500
There doesn't seem to be also an incorporation.
link |
01:14:41.260
First of all, the paper is not out yet,
link |
01:14:43.020
but there's a bunch of ideas already out.
link |
01:14:44.860
There does seem to be an incorporation of this other thing.
link |
01:14:48.100
I don't know if it's something that you could speak to,
link |
01:14:50.180
which is like the incorporation of like other structures,
link |
01:14:58.180
like evolutionary similar structures that are used
link |
01:15:02.300
to kind of give you hints.
link |
01:15:03.820
Yes.
link |
01:15:04.660
So evolutionary similarity is something
link |
01:15:08.380
that we can detect at different levels, right?
link |
01:15:10.740
So we know, for example, that the structure of proteins
link |
01:15:15.740
is more conserved than the sequence.
link |
01:15:20.380
The sequence could be very different,
link |
01:15:22.180
but the structural shape is actually still very conserved.
link |
01:15:26.140
So that's sort of the intrinsic property
link |
01:15:28.060
that, you know, in a way related to protein folds,
link |
01:15:30.980
you know, to the evolution of the, you know,
link |
01:15:33.980
of the protein of proteins and protein domains, et cetera.
link |
01:15:37.660
But we know that.
link |
01:15:38.500
I mean, we've been multiple studies.
link |
01:15:40.900
And, you know, ideally if you have structures,
link |
01:15:45.180
you know, you should use that information.
link |
01:15:48.580
However, sometimes we don't have this information.
link |
01:15:51.180
Instead, we have a bunch of sequences.
link |
01:15:53.180
Sequences we have a lot, right?
link |
01:15:54.860
So we have, you know, hundreds, thousands of,
link |
01:16:01.180
you know, different organisms sequence, right?
link |
01:16:04.260
And by taking the same protein,
link |
01:16:07.820
but in different organisms and aligning it,
link |
01:16:11.580
so making it, you know, making the corresponding positions
link |
01:16:15.940
aligned, we can actually say a lot about sort of
link |
01:16:22.180
what is conserved in this protein.
link |
01:16:24.140
And therefore, you know, structurally more stable,
link |
01:16:26.860
what is diverse in this protein.
link |
01:16:28.860
So on top of that, we could provide sort of the information
link |
01:16:32.340
about the sort of the secondary structure
link |
01:16:35.060
of this protein, et cetera, et cetera.
link |
01:16:36.380
So this information is extremely useful.
link |
01:16:39.900
And it's already there.
link |
01:16:41.260
So while it's tempting to, you know,
link |
01:16:44.180
to do a complete ab initio,
link |
01:16:46.060
so you just have a protein sequence and nothing else,
link |
01:16:49.540
the reality is such that we are overwhelmed with this data.
link |
01:16:54.220
So why not use it?
link |
01:16:56.500
And so yeah, so I'm looking forward
link |
01:16:59.220
to reading this paper.
link |
01:17:01.460
It does seem to, like they've,
link |
01:17:03.420
in the previous version of Alpha Fold,
link |
01:17:05.100
they didn't, for this evolutionary similarity thing,
link |
01:17:09.780
they didn't use machine learning for that.
link |
01:17:12.980
Or they, rather they used it as like the input
link |
01:17:15.620
to the entirety of the neural net,
link |
01:17:17.900
like the features derived from the similarity.
link |
01:17:22.020
It seems like there's some kind of quote, unquote,
link |
01:17:24.660
iterative thing where it seems to be part of the,
link |
01:17:29.660
part of the learning process is the incorporation
link |
01:17:32.580
of this evolutionary similarity.
link |
01:17:34.260
Yeah, I don't think there is a bio archive paper, right?
link |
01:17:36.940
There's nothing.
link |
01:17:37.780
There's nothing.
link |
01:17:38.620
There's a blog post that's written
link |
01:17:40.700
by a marketing team essentially,
link |
01:17:42.620
which, you know, it has some scientific similarity
link |
01:17:48.460
probably to the actual methodology used,
link |
01:17:51.820
but it could be, it's like interpreting scripture.
link |
01:17:55.260
It could be, it could be just poetic interpretations
link |
01:17:58.820
of the actual work as opposed
link |
01:18:00.060
to direct connection to the work.
link |
01:18:01.940
So now speaking about protein folding, right?
link |
01:18:04.300
So, so, so, you know, in order to answer the question,
link |
01:18:06.860
whether or not we have solved this, right?
link |
01:18:09.460
So we need to go back to the beginning
link |
01:18:11.780
of our conversation, you know,
link |
01:18:13.580
with the realization that, you know,
link |
01:18:15.020
an average protein is that typically what the CASP
link |
01:18:21.060
has been focusing on is the, you know,
link |
01:18:24.100
the, this competition has been focusing on the single,
link |
01:18:27.260
maybe two domain proteins that are still very compact.
link |
01:18:31.060
And even those ones are extremely challenging to solve, right?
link |
01:18:35.380
But now we talk about, you know,
link |
01:18:37.620
an average protein that has two, three protein domains.
link |
01:18:42.380
If you look at the proteins that are in charge
link |
01:18:46.900
of the, you know, of the process with the neural system, right?
link |
01:18:52.740
Perhaps one of the most recently evolved sort of systems
link |
01:19:00.220
in the organism, right?
link |
01:19:04.140
All of them, well, the majority of them
link |
01:19:06.380
are highly multi domain proteins.
link |
01:19:09.020
So they are, you know, some of them have five, six, seven,
link |
01:19:13.540
you know, and more domains, right?
link |
01:19:16.860
And, you know, we are very far away
link |
01:19:20.020
from understanding how these proteins are folded.
link |
01:19:22.460
So the complexity of the protein matters here,
link |
01:19:24.500
the complexity of the protein modules
link |
01:19:27.980
or the protein domains.
link |
01:19:30.260
So you're saying solved.
link |
01:19:34.260
So the definition of solved here
link |
01:19:35.980
is particularly the cast competition,
link |
01:19:38.620
achieving human level, not human level,
link |
01:19:41.740
achieving experimental level performance
link |
01:19:45.620
on these particular sets of proteins
link |
01:19:48.540
that have been used in these competitions.
link |
01:19:50.300
Well, I mean, you know, I do think that, you know,
link |
01:19:54.700
especially with regards to the alpha fold, you know,
link |
01:19:57.780
it is able to, you know, to solve, you know,
link |
01:20:04.060
at the near experimental level,
link |
01:20:09.020
pretty big majority of the more compact proteins,
link |
01:20:14.980
like or protein domains, because again,
link |
01:20:17.460
in order to understand how the overall protein,
link |
01:20:22.300
you know, multi domain protein fold,
link |
01:20:24.660
we do need to understand the structure
link |
01:20:26.820
of its individual domains.
link |
01:20:28.780
I mean, unlike if you look at alpha zero or like even mu zero,
link |
01:20:35.180
if you look at that work, you know, it's nice,
link |
01:20:37.980
reinforcement learning, self playing mechanisms are nice
link |
01:20:41.140
because it's all in simulation.
link |
01:20:42.420
So you can learn from just huge amounts,
link |
01:20:45.940
like you don't need data with like the problem with proteins,
link |
01:20:49.780
like the size, I forget how many 3D structures have been mapped,
link |
01:20:54.780
but the training data is very small, no matter what.
link |
01:20:57.380
It's like millions, maybe a one or two millions,
link |
01:21:00.060
something like that.
link |
01:21:01.180
But some very small number,
link |
01:21:02.700
but like it doesn't seem like that's scalable.
link |
01:21:06.580
There has to be, I don't know,
link |
01:21:09.140
it feels like you want to somehow 10x the data
link |
01:21:12.860
or 100x the data somehow.
link |
01:21:15.460
Yes, but we also can take advantage of homology models,
link |
01:21:20.460
right, so the models that are of very good quality
link |
01:21:24.460
because they are essentially obtained
link |
01:21:28.460
based on the evolutionary information.
link |
01:21:31.220
So there is a potential to enhance this information
link |
01:21:36.220
and use it again to empower the training set.
link |
01:21:41.220
And it's, I think, I am actually very optimistic.
link |
01:21:46.220
I think it's been one of these sort of, you know,
link |
01:21:54.500
churning events where you have a system
link |
01:21:59.500
that is, you know, a machine learning system
link |
01:22:03.500
that is very, very, very, very, very, very, very, very, very,
link |
01:22:08.500
you know, a machine learning system
link |
01:22:11.300
that is truly better than the sort of the more conventional
link |
01:22:15.740
biophysics based methods.
link |
01:22:17.940
That's a huge leap.
link |
01:22:19.380
This is one of those fun questions,
link |
01:22:21.260
but where would you put it in the ranking
link |
01:22:26.740
of the greatest breakthroughs in artificial intelligence
link |
01:22:29.740
history?
link |
01:22:31.740
So like, okay, so let's see who's in the running.
link |
01:22:34.940
Maybe you can correct me.
link |
01:22:35.860
So you got like AlphaZero and AlphaGo beating, you know,
link |
01:22:41.260
beating the world champion at the game of Go.
link |
01:22:44.540
Thought to be impossible like 20 years ago,
link |
01:22:48.260
or at least the AI community was highly skeptical.
link |
01:22:51.380
Then you got like also DBlue original Kasparov.
link |
01:22:55.100
You have deep learning itself,
link |
01:22:56.300
like the, maybe what would you say,
link |
01:22:58.340
the AlexNet image net moment.
link |
01:23:00.980
So the first network achieving human level performance,
link |
01:23:04.820
super not, that's not true.
link |
01:23:07.900
Achieving like a big leap in performance
link |
01:23:11.020
on the computer vision problem.
link |
01:23:14.460
There is open AI, the whole like GPT three,
link |
01:23:19.020
that whole space of transformers and language models,
link |
01:23:23.060
just achieving this incredible performance
link |
01:23:27.140
of application of neural networks to language models.
link |
01:23:31.820
Boston Dynamics, pretty cool.
link |
01:23:33.580
Like robotics, even though people are like, there's no AI.
link |
01:23:38.220
No, no, there's no machine learning currently,
link |
01:23:41.560
but AI is much bigger than machine learning.
link |
01:23:44.020
Yes.
link |
01:23:45.340
So that just the engineering aspect,
link |
01:23:48.900
I would say it's one of the greatest accomplishments
link |
01:23:50.820
in engineering side.
link |
01:23:52.900
Engineering meaning like mechanical engineering
link |
01:23:56.220
of robotics ever.
link |
01:23:58.020
Then of course, autonomous vehicles,
link |
01:23:59.540
you can argue for Waymo,
link |
01:24:01.340
which is like the Google self driving car,
link |
01:24:03.620
or you can argue for Tesla,
link |
01:24:05.500
which is like actually being used
link |
01:24:07.900
by hundreds of thousands of people
link |
01:24:09.900
on the road to day machine learning system.
link |
01:24:13.740
And I don't know if you can, what else is there?
link |
01:24:17.580
But I think that's it.
link |
01:24:18.420
So, and then out for full,
link |
01:24:20.020
many people are saying as up there, potentially number one,
link |
01:24:23.340
would you put them at number one?
link |
01:24:24.860
Well, in terms of the impact on the science
link |
01:24:29.860
and on the society beyond,
link |
01:24:31.900
it's definitely to me would be one of the, you know.
link |
01:24:37.460
Top three, what do you want?
link |
01:24:39.420
Maybe, I mean, I'm probably not the best person
link |
01:24:43.060
to answer that, but I do have,
link |
01:24:48.060
I remember my, you know, back in, I think, 1997,
link |
01:24:54.540
when Deep Blue, that Kasparov,
link |
01:24:58.220
it was, I mean, it was a shock.
link |
01:25:01.900
I mean, it was, and I think for the, you know,
link |
01:25:07.980
for the pre substantial part of the world,
link |
01:25:14.220
that especially people who have some, you know,
link |
01:25:19.140
some experience with chess, right?
link |
01:25:21.780
And realizing how incredibly human this game,
link |
01:25:25.660
how, you know, how much of a brainpower you need,
link |
01:25:29.620
you know, to reach those, you know,
link |
01:25:32.740
those levels of grandmasters, right, level.
link |
01:25:35.980
Yeah, I mean, it's probably one of the first time,
link |
01:25:37.940
and how good Kasparov was.
link |
01:25:39.580
And again, yeah, so Kasparov is actually one
link |
01:25:42.660
of the best ever, right?
link |
01:25:45.580
And you get a machine that beats him, right?
link |
01:25:48.100
So it's, you know.
link |
01:25:48.940
First time a machine probably beat a human at that scale
link |
01:25:52.060
of a thing, of anything.
link |
01:25:53.740
Yes.
link |
01:25:54.580
Yes, so that was, to me, that was like, you know,
link |
01:25:57.220
one of the groundbreaking events in the history of FAYA.
link |
01:26:00.580
Yeah, that's probably number one.
link |
01:26:01.900
I probably, like, we don't, it's hard to remember.
link |
01:26:05.420
It's like Muhammad Ali versus, I don't know,
link |
01:26:08.140
any other Mike Tyson, something like that.
link |
01:26:09.860
It's like, nah, you got to put Muhammad Ali
link |
01:26:11.940
at number one, same with Deep Blue,
link |
01:26:15.300
even though it's not machine learning based.
link |
01:26:19.260
Still, it uses advanced search,
link |
01:26:21.540
and search is the integral part of FAYA, right?
link |
01:26:24.060
So, as you said, it's...
link |
01:26:25.380
People don't think of it that way at this moment.
link |
01:26:27.660
In Vogue currently, search is not seen
link |
01:26:30.900
as a fundamental aspect of intelligence,
link |
01:26:34.220
but it very well, and you very likely is.
link |
01:26:37.700
In fact, I mean, that's what neural networks are.
link |
01:26:39.860
They're just performing search on the space of parameters.
link |
01:26:42.900
And it's all search.
link |
01:26:45.540
All of intelligence is some form of search,
link |
01:26:47.740
and you just have to become clever and clever
link |
01:26:49.660
at that search problem.
link |
01:26:50.900
And I also have another one that you didn't mention
link |
01:26:53.980
that's one of my favorite ones is...
link |
01:26:58.260
So, you probably heard of this.
link |
01:26:59.860
It's, I think it's called Deep Rembrandt.
link |
01:27:03.420
It's the project where they trained...
link |
01:27:06.780
I think there was a collaboration between the experts
link |
01:27:11.580
in Rembrandt painting in Netherlands,
link |
01:27:15.500
and a group, an artificial intelligence group,
link |
01:27:18.300
where they train an algorithm to replicate the style
link |
01:27:22.140
of the Rembrandt, and they actually printed a portrait
link |
01:27:26.980
that never existed before in the style of Rembrandt.
link |
01:27:31.020
They, I think they printed it only sort of on the canvas
link |
01:27:37.540
that using pretty much same types of paints
link |
01:27:41.420
and stuff, to me, it was mind blowing.
link |
01:27:44.020
Yeah, and the space of art, that's interesting.
link |
01:27:46.900
There hasn't been, maybe that's it,
link |
01:27:50.060
but I think there hasn't been an image in that moment yet
link |
01:27:54.540
in the space of art.
link |
01:27:56.740
You haven't been able to achieve
link |
01:27:58.580
super human level performance in the space of art,
link |
01:28:01.380
even though there was a big famous thing
link |
01:28:04.620
where there was a piece of art was purchased,
link |
01:28:07.620
I guess, for a lot of money.
link |
01:28:08.660
Yes. Yeah.
link |
01:28:09.740
But it's still, people are like in the space of music,
link |
01:28:13.260
at least, it's clear that human created pieces
link |
01:28:18.260
are much more popular.
link |
01:28:21.740
So there hasn't been a moment where it's like,
link |
01:28:24.420
oh, this is, where now, I would say in the space of music,
link |
01:28:28.820
what makes a lot of money?
link |
01:28:30.180
We're talking about serious money.
link |
01:28:32.140
It's music and movies, or like shows and so on,
link |
01:28:35.340
and entertainment.
link |
01:28:36.660
There hasn't been a moment where AI created,
link |
01:28:41.300
AI was able to create a piece of music,
link |
01:28:44.500
or a piece of cinema, or like Netflix show,
link |
01:28:49.860
that is sufficiently popular to make a ton of money.
link |
01:28:55.740
And that moment would be very, very powerful,
link |
01:28:58.980
because that's an AI system being used
link |
01:29:02.060
to make a lot of money.
link |
01:29:03.060
And like direct, of course, AI tools,
link |
01:29:05.500
like even premiere, audio editing, all the editing,
link |
01:29:07.980
everything I do, to edit this podcast,
link |
01:29:09.820
there's a lot of AI involved.
link |
01:29:11.380
I won't, actually, this is a program,
link |
01:29:13.300
I wanna talk to those folks just
link |
01:29:14.500
because I wanna nerd out, it's called Isotope,
link |
01:29:16.740
I don't know if you're familiar with it.
link |
01:29:18.100
They have a bunch of tools of audio processing,
link |
01:29:20.220
and they have, I think they're Boston based.
link |
01:29:23.100
Just, it's so exciting to be, to use it,
link |
01:29:26.420
like on the audio here,
link |
01:29:28.260
because it's all machine learning.
link |
01:29:30.420
It's not, because most audio production stuff
link |
01:29:35.820
is like any kind of processing you do,
link |
01:29:37.580
it's very basic signal processing.
link |
01:29:39.540
And you're tuning knobs and so on.
link |
01:29:42.020
They have all of that, of course,
link |
01:29:43.620
but they also have all of this machine learning stuff,
link |
01:29:46.060
like where you actually give it training data,
link |
01:29:48.580
you select parts of the audio you train on,
link |
01:29:51.780
you train on it, and it figures stuff out, it's great.
link |
01:29:56.860
It's able to detect, like the ability of it
link |
01:30:00.660
to be able to separate voice and music, for example,
link |
01:30:04.860
or voice and anything is incredible.
link |
01:30:07.260
Like it just, it's clearly exceptionally good
link |
01:30:11.180
at applying these different neural networks models
link |
01:30:14.980
to separate the different kinds of signals from the audio.
link |
01:30:20.700
Okay, so that's really exciting.
link |
01:30:22.300
Photoshop, Adobe, people also use it,
link |
01:30:24.620
but to generate a piece of music
link |
01:30:28.340
that will sell millions, a piece of art, yeah.
link |
01:30:32.020
No, I agree, and that's, as I mentioned,
link |
01:30:37.020
I offer my AI class and an integral part of this
link |
01:30:43.620
is a project, right?
link |
01:30:44.620
So it's my favorite, ultimate favorite part
link |
01:30:47.300
because typically we have this project presentations
link |
01:30:51.340
the last two weeks of the classes,
link |
01:30:53.660
right before the Christmas break,
link |
01:30:56.140
and it adds this cool excitement.
link |
01:31:00.260
And every time, I'm amazed with some projects
link |
01:31:05.260
that people come up with.
link |
01:31:08.420
And so, and quite a few of them are actually,
link |
01:31:12.780
they have some link to arts.
link |
01:31:17.780
I mean, I think last year, we had a group
link |
01:31:21.900
who designed an AI producing Hokus, Japanese poems.
link |
01:31:28.180
Oh, wow.
link |
01:31:29.500
So, and some of them, so it got trained
link |
01:31:33.260
on the English place, Hokus, Hokus, right there.
link |
01:31:37.260
So, and some of them, they get to present
link |
01:31:42.340
like the top selection, they were pretty good.
link |
01:31:45.140
I mean, of course, I'm not a specialist,
link |
01:31:47.780
but you read them and you see it.
link |
01:31:50.340
It seems profound.
link |
01:31:51.420
Yes, yeah, it seems reasonable.
link |
01:31:53.540
So it's kinda cool.
link |
01:31:55.820
We also had a couple of projects
link |
01:31:57.820
where people tried to teach AI
link |
01:32:00.740
how to play rock music, classical music,
link |
01:32:05.780
I think, and popular music.
link |
01:32:10.380
Interestingly enough, classical music
link |
01:32:14.260
was among the most difficult ones.
link |
01:32:17.100
And of course, if you look at the grandmasters of music,
link |
01:32:22.100
like Bach, right, so there's a lot of almost math.
link |
01:32:28.100
Yeah, well, he's very mathematical, right?
link |
01:32:29.940
Yeah, exactly.
link |
01:32:30.780
So this is, I would imagine that at least some style
link |
01:32:34.780
of this music could be picked up,
link |
01:32:37.100
but then you have completely different spectrum
link |
01:32:40.100
of classical composers.
link |
01:32:42.460
And so, you know, and you know,
link |
01:32:45.780
I think it's a little bit of a challenge
link |
01:32:49.060
and so it's almost like you don't have to sort of
link |
01:32:55.740
look at the data, you just listen to it
link |
01:32:57.980
and say, nah, that's not it.
link |
01:33:00.620
Not yet.
link |
01:33:01.460
That's not it.
link |
01:33:02.300
Yeah, that's how I feel too.
link |
01:33:03.340
There's open AI as I think open muse
link |
01:33:05.820
or something like that, the system.
link |
01:33:07.540
It's cool, but it's like, yeah,
link |
01:33:09.740
it's not compelling for some reason.
link |
01:33:12.060
It could be a psychological reason too.
link |
01:33:14.180
Maybe we need to have a human being,
link |
01:33:17.540
a tortured soul behind the music.
link |
01:33:19.620
I don't know.
link |
01:33:20.660
Yeah, no, absolutely, I completely agree.
link |
01:33:23.900
But yeah, whether or not we'll have,
link |
01:33:26.540
one day we'll have, you know,
link |
01:33:29.140
a song written by an AI engine
link |
01:33:33.340
to be in like in top charts.
link |
01:33:36.060
Yeah.
link |
01:33:36.900
Musical charts.
link |
01:33:37.900
I wouldn't be surprised.
link |
01:33:40.500
I wouldn't be surprised.
link |
01:33:43.340
I wonder if we already have one.
link |
01:33:44.740
It just hasn't been announced.
link |
01:33:46.420
We would know.
link |
01:33:49.980
How hard is the multi protein folding problem?
link |
01:33:53.900
Is that kind of something you've already mentioned,
link |
01:33:57.060
which is baked into this idea
link |
01:33:58.660
of greater and greater complexity of proteins?
link |
01:34:01.140
Like multi domain proteins,
link |
01:34:03.260
is that basically become multi protein complexes?
link |
01:34:08.860
Yes, you got it right.
link |
01:34:10.580
So it's sort of, it has the components of both,
link |
01:34:16.580
of protein folding and protein, protein interactions.
link |
01:34:21.860
Because in order for these domains,
link |
01:34:24.420
I mean, many of these proteins actually,
link |
01:34:27.260
they never form a stable structure.
link |
01:34:31.140
One of my favorite proteins,
link |
01:34:32.980
and pretty much everyone who works in the,
link |
01:34:37.700
I know, who I know who works with proteins,
link |
01:34:41.740
they always have their favorite proteins, right?
link |
01:34:44.860
So one of my favorite proteins,
link |
01:34:47.660
probably my favorite protein,
link |
01:34:49.140
the one that I worked when I was a postdoc,
link |
01:34:51.420
is so called post synaptic density 95, PSD95 protein.
link |
01:34:56.180
So it's one of the key actors
link |
01:35:00.500
in the majority of neurological processes
link |
01:35:03.780
at the molecular level.
link |
01:35:04.700
So it's essentially, it's a key player
link |
01:35:11.100
in the post synaptic density.
link |
01:35:13.500
So this is the crucial part of the synapse,
link |
01:35:17.220
where a lot of these chemical processes are happening.
link |
01:35:22.460
So it has five domains, right?
link |
01:35:26.260
So five protein domains, pretty large proteins,
link |
01:35:30.860
I think 600 something, I mean, I said, but, you know,
link |
01:35:37.220
the way it's organized itself, it's flexible, right?
link |
01:35:41.300
So it acts as a scaffold.
link |
01:35:43.860
So it is used to bring in other proteins.
link |
01:35:49.300
So they start acting in the orchestrated manner, right?
link |
01:35:54.300
So, and the type of the shape of this protein,
link |
01:35:58.820
it's in a way, there are some stable parts of this protein,
link |
01:36:02.540
but there are some flexible.
link |
01:36:04.500
And this flexibility is built in into the protein
link |
01:36:08.580
in order to become sort of this multifunctional machine.
link |
01:36:13.140
So do you think that kind of thing is also learnable
link |
01:36:16.500
through the alpha fold to kind of approach?
link |
01:36:19.380
I mean, the time will tell.
link |
01:36:22.420
Is it another level of complexity?
link |
01:36:24.460
Is it like, how big of a jump in complexity
link |
01:36:27.340
is that whole thing?
link |
01:36:28.180
To me, it's yet another level of complexity
link |
01:36:31.380
because when we talk about protein, protein interactions,
link |
01:36:35.180
and there is actually a different challenge for this
link |
01:36:38.860
called capri.
link |
01:36:40.020
And so this, that is focused specifically
link |
01:36:43.460
on macromolecular interactions,
link |
01:36:45.700
protein, protein, protein, DNA, et cetera.
link |
01:36:48.580
So, but it's, you know, there are different mechanisms
link |
01:36:53.580
that govern molecular interactions
link |
01:36:58.780
and that need to be picked up,
link |
01:37:00.780
say by a machine learning algorithm.
link |
01:37:03.700
Interestingly enough, we actually,
link |
01:37:06.580
we participated for a few years in this competition.
link |
01:37:11.780
We typically don't participate in competitions,
link |
01:37:14.980
I don't know, don't have enough time, you know,
link |
01:37:19.860
because it's very intensive process.
link |
01:37:23.740
But we participated back in, you know,
link |
01:37:28.220
about 10 years ago or so.
link |
01:37:30.620
And the way we enter this competition,
link |
01:37:32.700
so we design a scoring function, right?
link |
01:37:35.460
So the function that evaluates whether or not
link |
01:37:38.140
your protein, protein interaction
link |
01:37:40.580
is supposed to look like experimentally solved, right?
link |
01:37:43.380
So the scoring function is very critical part
link |
01:37:45.940
of the model prediction.
link |
01:37:49.840
So we designed it to be a machine learning one.
link |
01:37:52.740
And so it was one of the first machine learning
link |
01:37:56.660
based scoring function used in capri.
link |
01:38:00.020
And, you know, we essentially, you know,
link |
01:38:03.980
learned what should contribute,
link |
01:38:06.620
what are the critical components contributing
link |
01:38:08.900
into the protein, protein interactions.
link |
01:38:10.660
So this could be converted into learning problem
link |
01:38:13.380
and thereby it could be learned.
link |
01:38:15.660
I believe so, yes.
link |
01:38:17.060
Do you think Alpha Fold 2 or something similar to it
link |
01:38:20.500
from DeepMind or somebody else will be,
link |
01:38:24.340
will result in a Nobel Prize or multiple Nobel Prizes?
link |
01:38:28.700
So like, you know, obviously, maybe not so obviously,
link |
01:38:33.340
you can't give a Nobel Prize to a computer program.
link |
01:38:38.060
You, at least for now, give it to the designers
link |
01:38:41.020
of that program.
link |
01:38:42.180
But is, do you see one or multiple Nobel Prizes
link |
01:38:46.100
where Alpha Fold 2 is like a large percentage
link |
01:38:51.740
of what that prize is given for?
link |
01:38:54.900
Would it lead to discoveries at the level of Nobel Prizes?
link |
01:39:00.540
I mean, I think we are definitely destined
link |
01:39:05.420
to see the Nobel Prize becoming sort of,
link |
01:39:08.740
to be evolving with the evolution of science.
link |
01:39:12.340
And the evolution of science is such
link |
01:39:14.540
that it now becomes like really multifaceted, right?
link |
01:39:17.860
So where you don't really have like a unique discipline,
link |
01:39:21.340
you have sort of the, a lot of cross disciplinary talks
link |
01:39:25.680
in order to achieve sort of, you know,
link |
01:39:28.500
really big advancements, you know.
link |
01:39:32.420
So I think, you know, the computational methods
link |
01:39:39.220
will be acknowledged in one way or another.
link |
01:39:42.540
And as a matter of fact, you know,
link |
01:39:46.900
they were first acknowledged back in 2013, right?
link |
01:39:50.620
Where, you know, the first three people were,
link |
01:39:56.060
you know, awarded the Nobel Prize for studying
link |
01:39:59.740
the protein folding, right, the principle.
link |
01:40:01.540
And, you know, I think all three of them
link |
01:40:03.860
are computational biophysicists, right?
link |
01:40:06.940
So, you know, that I think is, is unavoidable, you know.
link |
01:40:14.580
It will come with a time.
link |
01:40:18.500
The fact that, you know, alpha fold
link |
01:40:22.940
and, you know, similar approaches,
link |
01:40:24.540
because again, it's a matter of time
link |
01:40:26.380
that people will embrace the, this, you know, principle
link |
01:40:31.740
and we'll see more and more such, you know,
link |
01:40:34.980
such tools coming into play.
link |
01:40:36.980
But, you know, these methods will be critical
link |
01:40:42.420
in a scientific discovery, no, no doubts about it.
link |
01:40:47.700
On the engineering side, maybe a dark question,
link |
01:40:51.500
but do you think it's possible
link |
01:40:53.420
to use these machine learning methods
link |
01:40:55.180
to start to engineer proteins?
link |
01:40:59.020
And the next question is something,
link |
01:41:03.180
quite a few biologists are against summer four
link |
01:41:06.100
for study purposes is to engineer viruses.
link |
01:41:09.660
Do you think machine learning, like something like alpha fold
link |
01:41:12.660
could be used to engineer viruses?
link |
01:41:14.820
So, to answer the first question, you know,
link |
01:41:17.020
it has been, you know, a part of the research
link |
01:41:21.700
in the protein science, the protein design
link |
01:41:24.540
is, you know, is a very prominent areas of research.
link |
01:41:29.180
Of course, you know, one of the pioneers
link |
01:41:31.060
is David Baker and Rosetta algorithm
link |
01:41:33.660
that essentially was doing the, the, the nova design
link |
01:41:38.220
and was used to design new proteins, you know.
link |
01:41:41.500
And design of proteins means design of functions.
link |
01:41:44.220
So like when you design a protein, you can control,
link |
01:41:47.340
I mean, the whole point of a protein
link |
01:41:49.100
with the protein structure comes a function
link |
01:41:52.180
like it's doing something.
link |
01:41:53.700
Correct.
link |
01:41:54.540
So you can design different things.
link |
01:41:56.060
So you can, yeah, so you can, well, you can look
link |
01:41:58.740
at the proteins from the functional perspective.
link |
01:42:00.700
You can also look at the proteins
link |
01:42:02.740
from the structural perspective, right?
link |
01:42:04.220
So the structural building blocks.
link |
01:42:05.740
So if you want to have a building block of a certain shape,
link |
01:42:08.900
you can try to achieve it by, you know,
link |
01:42:11.220
introducing a new protein sequence
link |
01:42:13.180
and predicting, you know, how it will fold.
link |
01:42:17.300
So, so with that, I mean, it's, it's a natural,
link |
01:42:22.060
one of the, you know, natural applications
link |
01:42:25.860
of these algorithms.
link |
01:42:28.260
Now talking about engineering a virus.
link |
01:42:34.180
With machine learning.
link |
01:42:35.220
With machine learning, right?
link |
01:42:36.420
So, so, well, you know, so luckily for us,
link |
01:42:41.740
I mean, we don't have that much data, right?
link |
01:42:47.700
We actually, right now, one of the projects
link |
01:42:50.180
that we are carrying on in the lab
link |
01:42:53.020
is we're trying to develop a machine learning algorithm
link |
01:42:57.020
that determines the, whether or not
link |
01:43:00.060
the current strain is pathogenic.
link |
01:43:02.740
And the current strain of the coronavirus.
link |
01:43:04.620
Of the virus.
link |
01:43:06.140
I mean, so, so there are applications to coronaviruses
link |
01:43:09.020
because we have strains of SARS CoV2, also SARS CoV,
link |
01:43:13.020
MERS that are pathogenic, but we also have strains
link |
01:43:16.340
of other coronaviruses that are, you know, not pathogenic.
link |
01:43:20.220
I mean, the common cold viruses and, you know,
link |
01:43:23.620
and some other ones, right?
link |
01:43:25.620
So, so, pathogenic, meaning spreading.
link |
01:43:29.020
Pathogenic means actually inflicting damage, correct.
link |
01:43:35.380
There are also some, you know, seasonal versus pandemic
link |
01:43:39.180
strains of influenza, right?
link |
01:43:41.820
And to determining what are the molecular determinant, right?
link |
01:43:45.540
So that are built in into the protein sequence,
link |
01:43:48.340
into the gene sequence, right?
link |
01:43:50.740
So, and whether or not the machine learning can determine
link |
01:43:55.020
those components, right?
link |
01:43:58.460
Oh, interesting.
link |
01:43:59.300
So like using machine learning to,
link |
01:44:00.700
that's really interesting to, to, to, to given,
link |
01:44:03.980
give the input is like what the entire,
link |
01:44:07.420
the protein sequence, and then determine
link |
01:44:09.780
if this thing is going to be able to do damage
link |
01:44:12.380
to, to a biological system.
link |
01:44:14.660
Yeah.
link |
01:44:15.980
So, so, I mean, so.
link |
01:44:16.820
It's a good machine learning part.
link |
01:44:17.660
You're saying we don't have enough data for that?
link |
01:44:19.420
We, I mean, for, for this specific one we do,
link |
01:44:22.660
we might actually, you know, have to back up on this.
link |
01:44:25.540
Cause we, we're still in the process.
link |
01:44:27.260
There was one work that appeared in bio archive
link |
01:44:31.660
by Eugene Kunin, who is one of these, you know,
link |
01:44:34.900
pioneers in, in, in evolutionary genomics.
link |
01:44:39.020
And they tried to look at this, but, you know,
link |
01:44:42.820
the methods were sort of standard, you know,
link |
01:44:46.060
supervised learning methods.
link |
01:44:48.620
And now the question is, you know,
link |
01:44:51.300
can you advance it further by, by using, you know,
link |
01:44:56.620
not so standard methods, you know,
link |
01:44:58.580
so there's obviously a lot of hope in, in transfer learning
link |
01:45:02.700
where you can actually try to transfer the information
link |
01:45:06.220
that the machine learning learns
link |
01:45:07.700
about the proper protein sequences, right?
link |
01:45:11.340
And, you know, so, so there is some promise
link |
01:45:16.140
in going this direction.
link |
01:45:17.260
But if we have this, it would be extremely useful
link |
01:45:20.460
because then we could essentially forecast
link |
01:45:22.940
the potential mutations that would make
link |
01:45:24.860
the current strain more or less pathogenic, right?
link |
01:45:27.940
Anticipate, anticipate them from a vaccine development
link |
01:45:31.140
for the treatment, anti, anti viral drug development.
link |
01:45:34.540
That would be a very crucial task.
link |
01:45:36.820
But you could also use that system to then say,
link |
01:45:42.140
how would we potentially modify this virus
link |
01:45:45.220
to make it more pathogenic?
link |
01:45:48.780
That's true.
link |
01:45:49.620
That's true.
link |
01:45:50.460
I mean, you know, the, again, the hope is,
link |
01:45:57.900
well, several things, right?
link |
01:45:59.660
So one is that, you know, it's,
link |
01:46:02.140
even if you design a, you know, a sequence, right?
link |
01:46:06.780
So to carry out the actual experimental biology
link |
01:46:12.540
to ensure that all the components working, you know,
link |
01:46:16.820
is a completely different method.
link |
01:46:19.780
Yes.
link |
01:46:20.860
Then the, you know, we've seen in the past,
link |
01:46:24.380
there could be some regulation of the moment
link |
01:46:27.660
the scientific community recognizes
link |
01:46:30.420
that it's now becoming no longer a sort of a fun puzzle
link |
01:46:34.620
to, you know, for machine learning.
link |
01:46:36.660
Could be a weapon.
link |
01:46:37.860
Yes. So then there might be some regulation.
link |
01:46:40.420
So I think back in what, 2015, there was, you know,
link |
01:46:45.420
there was an issue on regulating the research
link |
01:46:49.500
on influenza strains, right?
link |
01:46:52.460
That there were several groups, you know,
link |
01:46:55.540
use sort of the mutation analysis to determine
link |
01:46:59.660
whether or not this strain will jump
link |
01:47:01.780
from one species to another.
link |
01:47:03.260
And I think there was like a half a year moratorium
link |
01:47:06.500
on the research, on the paper published
link |
01:47:09.700
until, you know, scientists, you know, analyzed it
link |
01:47:13.540
and decided that it's actually safe.
link |
01:47:16.420
I forgot what that's called.
link |
01:47:17.620
Something a function, test a function.
link |
01:47:20.060
Gain a function.
link |
01:47:20.900
Gain a function.
link |
01:47:21.980
Yeah, gain a function, loss of function.
link |
01:47:23.620
That's right.
link |
01:47:24.460
Sorry.
link |
01:47:26.380
It's like, let's watch this thing mutate for a while
link |
01:47:29.620
to see like, to see what kind of things we can observe.
link |
01:47:33.740
I guess I'm not so much worried about that kind of research
link |
01:47:37.260
if there's a lot of regulation
link |
01:47:38.580
and if it's done very well and with competence and seriously.
link |
01:47:42.740
I am more worried about kind of this, you know,
link |
01:47:46.980
the underlying aspect of this question
link |
01:47:49.580
is more like 50 years from now.
link |
01:47:52.900
Speaking to the Drake equation,
link |
01:47:54.900
one of the parameters in the Drake equation
link |
01:47:57.260
is how long civilizations last.
link |
01:47:59.780
And that seems to be the most important value actually
link |
01:48:03.820
for calculating if there's other alien intelligence
link |
01:48:06.660
civilizations out there.
link |
01:48:08.020
That's where there's most variability,
link |
01:48:10.940
assuming like if life, if that percentage
link |
01:48:15.060
that life can emerge is like not zero,
link |
01:48:19.380
like if we're super unique,
link |
01:48:21.260
then it's the how long we last
link |
01:48:23.620
is basically the most important thing.
link |
01:48:26.180
So from a selfish perspective,
link |
01:48:28.980
but also from a Drake equation perspective,
link |
01:48:32.020
I'm worried about our civilization lasting.
link |
01:48:35.020
And you kind of think about all the ways
link |
01:48:37.620
in which machine learning can be used
link |
01:48:39.100
to design greater weapons of destruction, right?
link |
01:48:45.700
And I mean, one way to ask that,
link |
01:48:48.580
if you look sort of 50 years from now,
link |
01:48:50.540
100 years from now,
link |
01:48:52.620
would you be more worried about natural pandemics
link |
01:48:55.780
or engineered pandemics?
link |
01:48:59.420
Like who's the better designer of viruses,
link |
01:49:02.620
nature or humans if we look down the line?
link |
01:49:05.980
I think in my view, I would still be worried
link |
01:49:10.140
about the natural pandemics simply because,
link |
01:49:13.660
I mean, the capacity of the nature producing this.
link |
01:49:20.740
It does pretty good job, right?
link |
01:49:22.700
Yes.
link |
01:49:23.540
And the motivation for using virus,
link |
01:49:25.260
engineering viruses as a weapon is a weird one
link |
01:49:29.020
because maybe you can correct me on this,
link |
01:49:31.460
but it seems very difficult to target a virus, right?
link |
01:49:35.620
The whole point of a weapon, the way a rocket works,
link |
01:49:38.420
if a starting point, you have an end point
link |
01:49:40.140
and you're trying to hit a target,
link |
01:49:42.380
to hit a target with a virus is very difficult.
link |
01:49:44.700
It's basically just, right?
link |
01:49:47.100
It's, the target would be the human species.
link |
01:49:52.540
Man, yeah, I have a hope in us,
link |
01:49:54.820
I'm forever optimistic that we will not,
link |
01:49:58.300
there's no, there's insufficient evil in the world
link |
01:50:01.620
to lead that to that kind of destruction.
link |
01:50:04.580
Well, I also hope that, I mean, that's what we see.
link |
01:50:07.780
I mean, with the way we are getting connected,
link |
01:50:11.780
the world is getting connected.
link |
01:50:14.460
I think it helps for the world to become more transparent.
link |
01:50:21.700
Yeah.
link |
01:50:22.540
So the information spread is,
link |
01:50:27.100
I think it's one of the key things for the society
link |
01:50:31.660
to become more balanced one way or another.
link |
01:50:36.500
This is something that people disagree with me on,
link |
01:50:38.380
but I do think that the kind of secrecy
link |
01:50:41.940
that governments have,
link |
01:50:43.500
so you're kind of speaking more to the other aspects,
link |
01:50:47.100
like research community being more open,
link |
01:50:49.740
companies are being more open,
link |
01:50:52.180
government is still like,
link |
01:50:55.940
we're talking about like military secrets.
link |
01:50:57.900
I think military secrets of the kind
link |
01:51:01.420
that could destroy the world
link |
01:51:03.740
will become also a thing of the 20th century.
link |
01:51:07.340
It'll become more and more open.
link |
01:51:09.340
Like I think nations will lose power
link |
01:51:12.260
in the 21st century,
link |
01:51:13.220
like lose sufficient power to our secrecy.
link |
01:51:15.980
Transparency is more beneficial than secrecy,
link |
01:51:18.900
but of course that's not obvious.
link |
01:51:21.180
Let's hope so.
link |
01:51:22.220
Let's hope so that the governments will become
link |
01:51:27.220
more transparent.
link |
01:51:32.140
So we last talked I think in March or April,
link |
01:51:35.300
what have you learned?
link |
01:51:36.780
How is your philosophical, psychological,
link |
01:51:40.500
biological worldview changed since then?
link |
01:51:43.860
Or you've been studying it nonstop
link |
01:51:46.140
from a computational biology perspective.
link |
01:51:48.940
How is your understanding and thoughts
link |
01:51:50.460
about this virus changed over those months
link |
01:51:53.060
from the beginning to today?
link |
01:51:54.500
One thing that I was really amazed
link |
01:51:58.140
at how efficient the scientific community was.
link |
01:52:03.140
I mean, and even just judging on this very narrow domain
link |
01:52:10.100
of protein structure, understanding the structural
link |
01:52:15.740
characterization of this virus
link |
01:52:17.580
from the components point of view,
link |
01:52:19.900
whole virus point of view.
link |
01:52:21.460
If you look at SARS, right?
link |
01:52:26.060
The something that happened less than 20,
link |
01:52:31.060
but close enough 20 years ago,
link |
01:52:35.020
and you see what, when it happened,
link |
01:52:38.540
what was sort of the response by the scientific community,
link |
01:52:42.500
you see that the structure characterizations did occur,
link |
01:52:47.140
but it took several years, right?
link |
01:52:51.660
Now the things that took several years,
link |
01:52:54.940
it's a matter of months, right?
link |
01:52:56.900
So we see that the research pop up.
link |
01:53:01.620
We are at the unprecedented level
link |
01:53:03.980
in terms of the sequencing, right?
link |
01:53:06.020
Never before we had a single virus
link |
01:53:11.580
sequenced so many times, you know?
link |
01:53:14.820
So which allows us to actually, to trace very precisely
link |
01:53:20.380
the sort of the evolutionary nature of this virus,
link |
01:53:24.940
what happens, and it's not just this virus independently
link |
01:53:31.020
of everything, it's the sequence of this virus
link |
01:53:35.820
linked anchor to the specific geographic place
link |
01:53:39.940
to specific people because our genotype influences
link |
01:53:46.100
also the evolution of this.
link |
01:53:48.900
It's always a host pathogen evolution that occurs.
link |
01:53:55.420
It'd be cool if we also had a lot more data
link |
01:53:58.020
about the spread of this virus, not maybe,
link |
01:54:02.540
well, it'd be nice if we had it
link |
01:54:05.100
for like contact tracing purposes for this virus,
link |
01:54:08.100
but it'd be also nice if we had it for the study
link |
01:54:10.420
for future viruses to be able to respond and so on.
link |
01:54:13.540
But it's already nice that we have geographical data
link |
01:54:15.980
and the basic data from individual humans, yeah.
link |
01:54:18.420
Exactly, no, I think contact tracing is obviously
link |
01:54:22.820
a key component in understanding the spread of this virus.
link |
01:54:29.340
There is also, there is a number of challenges, right?
link |
01:54:31.700
So XPRIZE is one of them we just recently
link |
01:54:36.700
took a part of this competition.
link |
01:54:40.900
It's the prediction of the number of infections
link |
01:54:46.740
in different regions, so obviously the AI
link |
01:54:52.660
is the main topic in those predictions.
link |
01:54:56.300
Yeah, but it's still the data, I mean, that's a competition,
link |
01:55:00.340
but the data is weak on the training.
link |
01:55:05.340
It's great, it's much more than probably before,
link |
01:55:09.340
but it would be nice if it was really rich.
link |
01:55:12.900
I talked to Michael Mina from Harvard,
link |
01:55:16.780
I mean, he dreams that the community comes together
link |
01:55:19.020
with a weather map to wear of viruses,
link |
01:55:22.900
like really high resolution sensors on how,
link |
01:55:27.900
from person to person, the viruses that travel,
link |
01:55:29.900
all the different kinds of viruses,
link |
01:55:32.020
because there's a ton of them,
link |
01:55:34.660
and then you'll be able to tell the story
link |
01:55:36.820
that you've spoken about of the evolution of this virus.
link |
01:55:41.100
It's like day to day mutations that are occurring.
link |
01:55:44.820
I mean, that would be fascinating,
link |
01:55:46.100
just from a perspective of study,
link |
01:55:48.700
and from the perspective of being able
link |
01:55:50.220
to respond to future pandemics.
link |
01:55:51.660
That's ultimately what I'm worried about.
link |
01:55:54.980
People love books.
link |
01:55:56.460
Is there some three or whatever number of books,
link |
01:56:01.180
technical fiction, philosophical, that brought you joy in life,
link |
01:56:06.260
had an impact on your life,
link |
01:56:07.780
and maybe some that you would recommend others?
link |
01:56:11.340
So I'll give you three very different books,
link |
01:56:13.620
and I also have a special runner up, and a...
link |
01:56:17.180
Honorable matching.
link |
01:56:18.500
Yeah, I mean, it's an audiobook,
link |
01:56:21.980
and there's some specific reason behind it.
link |
01:56:25.500
So the first book is something that impacted my earlier stage of life,
link |
01:56:32.460
and I'm probably not going to be very original here.
link |
01:56:36.220
It's Bulgakov's Master and Margarita.
link |
01:56:39.140
So that's probably...
link |
01:56:41.300
Well, not for a Russian, maybe, it's not super original,
link |
01:56:43.860
but it's a really powerful book, even in English.
link |
01:56:47.660
So I read it in English.
link |
01:56:49.180
It is incredibly powerful, and I mean, it's the way it ends,
link |
01:56:55.180
right, so I still have goosebumps when I read the very last sort of...
link |
01:57:01.500
It's called Prologue, where it's just so powerful.
link |
01:57:05.780
What impact did it have on you? What ideas?
link |
01:57:07.900
What insights did you get from it?
link |
01:57:09.300
I was just taken by the fact that you have those parallel lives
link |
01:57:21.180
apart from many centuries, right?
link |
01:57:23.420
And somehow they got sort of intertwined into one story.
link |
01:57:30.340
And that, to me, was fascinating.
link |
01:57:33.860
And of course, the romantic part of this book is not just romance,
link |
01:57:41.740
it's like the romance empowered by sort of magic, right?
link |
01:57:46.860
And maybe on top of that, you have some irony,
link |
01:57:51.580
which is unavoidable, right, because it was that Soviet time.
link |
01:57:56.420
But it's very deeply Russian, so that's the wit, the humor, the pain,
link |
01:58:02.860
the love, all of that is one of the books that kind of captures something
link |
01:58:08.060
about Russian culture that people outside of Russia should probably read.
link |
01:58:12.540
I agree.
link |
01:58:13.140
What's the second one?
link |
01:58:14.220
So the second one is, again, another one that it happened...
link |
01:58:19.620
I read it later in my life.
link |
01:58:21.860
I think I read it first time when I was a graduate student.
link |
01:58:27.740
And that's the Solzhenitsyn Cancer Award.
link |
01:58:31.820
That is an amazingly powerful book.
link |
01:58:36.340
What is it about?
link |
01:58:37.620
It's about, I mean, essentially based on...
link |
01:58:41.620
Solzhenitsyn was diagnosed with cancer when he was reasonably young
link |
01:58:46.820
and he made a full recovery, but so this is about a person
link |
01:58:54.540
who was sentenced for life in one of these camps.
link |
01:59:00.580
And he had some cancer, so he was transported back
link |
01:59:06.820
to one of these Soviet republics, I think, South Asian republics.
link |
01:59:13.460
And the book is about his experience being a prisoner,
link |
01:59:24.820
being a patient in the cancer clinic in a cancer ward
link |
01:59:31.300
surrounded by people, many of which die, right?
link |
01:59:36.540
But in the way it reads, I mean, first of all,
link |
01:59:42.540
later on I read the accounts of the doctors
link |
01:59:47.620
who describe the experiences in the book
link |
01:59:55.180
by the patient as incredibly accurate.
link |
01:59:59.380
So I read that there was some doctors saying that
link |
02:00:04.220
every single doctor should read this book
link |
02:00:07.060
to understand what the patient feels.
link |
02:00:10.620
But again, as many of the Solzhenitsyn's books,
link |
02:00:16.980
it has multiple levels of complexity.
link |
02:00:19.700
And obviously, if you look above the cancer and the patient,
link |
02:00:26.380
I mean, the tumor that was growing and then disappeared
link |
02:00:32.460
in his body with some consequences.
link |
02:00:37.660
I mean, this is allegorically the Soviet...
link |
02:00:44.860
And he actually, when he was asked, he said that
link |
02:00:50.540
this is what made him think about this, how to combine these experiences.
link |
02:00:56.100
Him being a part of the Soviet regime,
link |
02:01:01.020
also being a part of someone sent to Gulag camp, right?
link |
02:01:07.900
And also someone who experienced cancer in his life.
link |
02:01:13.100
The Gulag archipelago and this book,
link |
02:01:16.540
these are the works that actually made him receive a Nobel Prize.
link |
02:01:22.740
But to me, I've read other books by Solzhenitsyn.
link |
02:01:31.380
This one, to me, is the most powerful one.
link |
02:01:34.700
And by the way, both this one and the previous one you read in Russian?
link |
02:01:38.620
Yes.
link |
02:01:40.220
So now the third book is an English book and it's completely different.
link |
02:01:45.660
So we're switching the gears completely.
link |
02:01:48.540
So this is the book, it's not even a book.
link |
02:01:52.220
It's an essay by Jonathan Neumann called The Computer and the Brain.
link |
02:01:59.580
And that was the book he was writing,
link |
02:02:03.940
knowing that he was dying of cancer.
link |
02:02:07.700
So the book was released back, it's a very thin book, right?
link |
02:02:12.260
But the power, the intellectual power in this book, in this essay is incredible.
link |
02:02:21.220
I mean, you probably know that von Neumann is considered to be one of the biggest thinkers,
link |
02:02:28.340
right?
link |
02:02:29.340
So his intellectual power was incredible, right?
link |
02:02:32.740
And you can actually feel this power in this book where the person is writing, knowing
link |
02:02:38.420
that he will die.
link |
02:02:41.380
The book actually got published only after his death, back in 1958.
link |
02:02:46.300
He died in 1957.
link |
02:02:48.740
But so he tried to put as many ideas that he still hadn't realized.
link |
02:02:59.860
And so this book is very difficult to read because every single paragraph is just compact,
link |
02:03:10.260
is filled with these ideas and the ideas are incredible.
link |
02:03:17.740
So nowadays, so he tried to put the parallels between the brain computing power, the neural
link |
02:03:25.660
system and the computers as they were.
link |
02:03:29.500
That whole year he was working on, it's like a approximately 57.
link |
02:03:33.820
So that was right during his, when he was diagnosed with cancer and he was essentially...
link |
02:03:39.740
Yeah, he's one of those, there's a few folks people mentioned.
link |
02:03:43.580
I think Ed Whitton is another, that everyone that meets them, they say he's just an intellectual
link |
02:03:50.660
powerhouse.
link |
02:03:51.660
Yes.
link |
02:03:52.660
Okay, so who's the honorable mention runner up?
link |
02:03:55.500
And this is, I mean, the reason I put it sort of in a separate section because this
link |
02:04:00.340
is a book that I reasonably recently listened to, so it's an audio book.
link |
02:04:08.260
And this is a book called Lab Girl by Hope Jaren.
link |
02:04:12.700
So Hope Jaren, she is a scientist, she's a geochemist that essentially studies the fossil
link |
02:04:23.860
plants.
link |
02:04:26.660
And so she uses this fossil, the chemical analysis to understand what was the climate
link |
02:04:35.020
back in a thousand years, hundreds of thousands of years ago.
link |
02:04:40.580
And so something that incredibly touched me by this book, it was narrated by the author.
link |
02:04:49.020
And it's an incredibly personal story, incredibly.
link |
02:04:54.220
So certain parts of the book, you could actually hear the author crying.
link |
02:05:01.380
And that to me, I mean, I never experienced anything like this, you know, reading the
link |
02:05:05.460
book, but it was like, you know, the connection between you and the author.
link |
02:05:12.940
And I think this is, you know, this is really a must read, but even better, a must listen
link |
02:05:20.620
to audio book for anyone who wants to learn about sort of, you know, academia, science,
link |
02:05:29.220
and research in general, because it's a very personal account about her becoming a scientist.
link |
02:05:38.140
So we're just before New Year's, you know, we talked a lot about some difficult topics
link |
02:05:46.500
of viruses and so on.
link |
02:05:47.860
Do you have some exciting things you're looking forward to in 2021?
link |
02:05:54.460
Some New Year's resolutions, maybe silly or fun, or something very important and fundamental
link |
02:06:03.260
to the world of science or something completely unimportant?
link |
02:06:06.980
Well, well, I'm definitely looking forward to towards, you know, things becoming normal.
link |
02:06:14.780
Right.
link |
02:06:15.780
So, yes, so I really miss traveling.
link |
02:06:21.220
Every summer, I go to a international summer school, it's called the School for Molecular
link |
02:06:28.580
and Theoretical Biology, it's held in Europe, it's organized by very good friends of mine,
link |
02:06:34.580
and this is the school for gifted kids from all over the world, and they're incredibly
link |
02:06:40.740
bright.
link |
02:06:41.740
It's like every time I go there, it's like, you know, it's a highlight of the year.
link |
02:06:47.180
And we couldn't make it this August, so we did this school remotely, but it's different.
link |
02:06:55.780
So I am definitely looking forward to next August coming there.
link |
02:07:01.100
I also mean, you know, one of my, you know, personal resolutions, I realized that, you
link |
02:07:07.620
know, being in house and working from home, you know, I realized that actually I apparently
link |
02:07:18.460
missed a lot, you know, spending time with my family, believe it or not, so you typically,
link |
02:07:26.340
you know, with all the research and, you know, and teaching and everything related to the
link |
02:07:33.820
academic life, I mean, you get distracted.
link |
02:07:38.420
And so, so, you know, you don't feel that, you know, the fact that you are away from
link |
02:07:45.300
your family doesn't affect you because you're, you know, naturally distracted by other things.
link |
02:07:51.140
And you know, this time, I realized that, you know, that that's so important, right?
link |
02:07:58.460
Spending your time with the family, with your kids, and so that would be my new year resolution
link |
02:08:05.340
in actually trying to spend as much time as possible.
link |
02:08:09.700
Even when the world opens up, yeah, that's a beautiful message, that's a beautiful reminder.
link |
02:08:15.580
I asked you if there's a Russian poem you could read that I could force you to read,
link |
02:08:22.060
and you said, okay, fine, sure.
link |
02:08:26.420
Do you mind reading?
link |
02:08:27.420
Sure.
link |
02:08:28.420
I mean, you said that no paper needed, so.
link |
02:08:30.860
Nope.
link |
02:08:31.860
So, yeah, so this poem was written by my namesake, another Dmitry, Dmitry Kemerfeldt, and is
link |
02:08:40.180
a, you know, it's a recent poem, and it's called Sorceress, Viedma, in Russian, or actually
link |
02:08:51.100
Kaldunia, so that's sort of another sort of connotation of Sorceress or witch.
link |
02:08:58.460
And I really like it, and it's one of just a handful poems I actually can recall by heart.
link |
02:09:05.460
I also have a very strong association when I read this poem with master Margarita, the
link |
02:09:13.380
main female character, Margarita.
link |
02:09:18.380
And also it's, you know, it's about, you know, it's happening about the same time we're talking
link |
02:09:23.700
now, so around New Year, around Christmas.
link |
02:09:29.820
Do you mind reading it in Russian?
link |
02:09:32.980
I'll give it a try.
link |
02:09:34.980
And then I'll come back.
link |
02:09:58.980
and twisted the world.
link |
02:10:00.980
So you took the eyes of your hero,
link |
02:10:04.980
that anyone who came down to bless
link |
02:10:07.980
was ready to give the devil the devil's soul
link |
02:10:10.980
without looking at this witch's connection.
link |
02:10:13.980
There was a thief hanging around in the bushes,
link |
02:10:16.980
but I, without any prejudices and rags,
link |
02:10:19.980
ran out to feel your exhaled breath on my lips,
link |
02:10:24.980
so that the skin, with the tongue, with the ribs,
link |
02:10:28.980
would be on the other side of the earth,
link |
02:10:32.980
like you flew over the earth,
link |
02:10:35.980
in Belayv Yugi, Belayzibi, Belangliya.
link |
02:10:49.980
To me, it has a lot of meaning about this,
link |
02:10:54.980
something that is happening, something that is far away,
link |
02:10:59.980
but still very close to you.
link |
02:11:01.980
And, yes, it's the winter.
link |
02:11:05.980
There's something magical about winter, isn't there?
link |
02:11:07.980
Yes.
link |
02:11:08.980
What is the...
link |
02:11:09.980
Well, I don't know.
link |
02:11:10.980
I don't know how to translate it,
link |
02:11:11.980
but a kiss in winter is interesting.
link |
02:11:15.980
Lips in winter and all that kind of stuff.
link |
02:11:17.980
It's beautifully...
link |
02:11:18.980
I mean, Russian as a way.
link |
02:11:20.980
As a reason, Russian poetry is just...
link |
02:11:22.980
I'm a fan of poetry in both languages,
link |
02:11:24.980
but English doesn't capture some of the magic
link |
02:11:27.980
that Russian seems to, so thank you for doing that.
link |
02:11:30.980
That was awesome.
link |
02:11:31.980
Dimitri, it's great to talk to you again.
link |
02:11:34.980
You're...
link |
02:11:35.980
It's contagious how much you love what you do,
link |
02:11:38.980
how much you love life,
link |
02:11:39.980
so I really appreciate you taking the time to talk today.
link |
02:11:42.980
And thank you for having me.
link |
02:11:44.980
Thanks for listening to this conversation with Dimitri Korkin,
link |
02:11:47.980
and thank you to our sponsors, Brave Browser,
link |
02:11:50.980
Netsuite Business Management Software,
link |
02:11:53.980
Magic Spoon Low Carb Serial,
link |
02:11:55.980
and A Sleep Self Cooling Mattress.
link |
02:11:58.980
So the choice is browsing privacy, business success,
link |
02:12:02.980
healthy diet, or comfortable sleep.
link |
02:12:05.980
Choose wisely, my friends,
link |
02:12:07.980
and if you wish, click the sponsor links below
link |
02:12:09.980
to get a discount and to support this podcast.
link |
02:12:12.980
And now, let me leave you with some words
link |
02:12:15.980
from Jeffrey Eugenides.
link |
02:12:17.980
Biology gives you a brain.
link |
02:12:20.980
Life turns it into a mind.
link |
02:12:23.980
Thank you for listening and hope to see you next time.