back to index

Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153


small model | large model

link |
00:00:00.000
The following is a conversation with Dmitry Korkin,
link |
00:00:02.860
his second time in the podcast.
link |
00:00:04.820
He's a professor of bioinformatics
link |
00:00:06.980
and computational biology at WPI,
link |
00:00:09.740
where he specializes in bioinformatics of complex disease,
link |
00:00:13.540
computational genomics, systems biology,
link |
00:00:16.300
and biomedical data analytics.
link |
00:00:18.540
He loves biology, he loves computing,
link |
00:00:22.080
plus he is Russian and recites a poem in Russian
link |
00:00:26.140
at the end of the podcast.
link |
00:00:27.760
What else could you possibly ask for in this world?
link |
00:00:31.080
Quick mention of our sponsors.
link |
00:00:32.960
Brave Browser, NetSuite Business Management Software,
link |
00:00:37.760
Magic Spoon Low Carb Cereal,
link |
00:00:40.280
and 8sleep Self Cooling Mattress.
link |
00:00:42.920
So the choice is browsing privacy, business success,
link |
00:00:46.360
healthy diet, or comfortable sleep.
link |
00:00:49.180
Choose wisely, my friends,
link |
00:00:50.660
and if you wish, click the sponsor links below
link |
00:00:53.640
to get a discount and to support this podcast.
link |
00:00:56.440
As a side note, let me say that to me,
link |
00:00:58.600
the scientists that did the best apolitical,
link |
00:01:01.520
impactful, brilliant work of 2020
link |
00:01:04.000
are the biologists who study viruses without an agenda,
link |
00:01:09.160
without much sleep, to be honest,
link |
00:01:11.800
just a pure passion for scientific discovery
link |
00:01:14.460
and exploration of the mysteries within viruses.
link |
00:01:18.400
Viruses are both terrifying and beautiful.
link |
00:01:21.340
Terrifying because they can threaten
link |
00:01:22.960
the fabric of human civilization,
link |
00:01:25.120
both biological and psychological.
link |
00:01:27.840
Beautiful because they give us insights
link |
00:01:30.480
into the nature of life on Earth
link |
00:01:32.920
and perhaps even extraterrestrial life
link |
00:01:35.900
of the not so intelligent variety
link |
00:01:37.960
that might meet us one day
link |
00:01:39.560
as we explore the habitable planets
link |
00:01:41.520
and moons in our universe.
link |
00:01:43.760
If you enjoy this thing, subscribe on YouTube,
link |
00:01:45.800
review it on Apple Podcast, follow on Spotify,
link |
00:01:49.040
support on Patreon, or connect with me on Twitter
link |
00:01:51.740
at Lex Friedman.
link |
00:01:53.160
And now here's my conversation with Dmitry Korkin.
link |
00:01:57.920
It's often said that proteins
link |
00:02:00.680
and the amino acid residues that make them up
link |
00:02:04.200
are the building blocks of life.
link |
00:02:06.400
Do you think of proteins in this way
link |
00:02:08.000
as the basic building blocks of life?
link |
00:02:11.160
Yes and no.
link |
00:02:12.200
So the proteins indeed is the basic unit,
link |
00:02:16.300
biological unit that carries out
link |
00:02:20.480
important function of the cell.
link |
00:02:22.800
However, through studying the proteins
link |
00:02:25.800
and comparing the proteins across different species,
link |
00:02:29.360
across different kingdoms,
link |
00:02:31.440
you realize that proteins are actually
link |
00:02:34.640
much more complicated.
link |
00:02:36.760
So they have so called modular complexity.
link |
00:02:42.280
And so what I mean by that is an average protein
link |
00:02:47.280
consists of several structural units.
link |
00:02:54.760
So we call them protein domains.
link |
00:02:57.440
And so you can imagine a protein as a string of beads
link |
00:03:02.580
where each bead is a protein domain.
link |
00:03:05.760
And in the past 20 years,
link |
00:03:10.240
scientists have been studying
link |
00:03:13.040
the nature of the protein domains
link |
00:03:15.040
because we realize that it's the unit.
link |
00:03:19.480
Because if you look at the functions, right?
link |
00:03:22.120
So many proteins have more than one function
link |
00:03:25.880
and those protein functions are often carried out
link |
00:03:29.440
by those protein domains.
link |
00:03:31.560
So we also see that in the evolution,
link |
00:03:37.320
those proteins domains get shuffled.
link |
00:03:40.160
So they act actually as a unit.
link |
00:03:43.460
Also from the structural perspective, right?
link |
00:03:45.280
So some people think of a protein
link |
00:03:50.960
as a sort of a globular molecule,
link |
00:03:55.320
but as a matter of fact,
link |
00:03:56.800
is the globular part of this protein is a protein domain.
link |
00:04:02.500
So we often have this, again,
link |
00:04:06.000
the collection of this protein domains
link |
00:04:09.600
align on a string as beads.
link |
00:04:14.800
And the protein domains are made up of amino acid residue.
link |
00:04:17.880
So we're talking about.
link |
00:04:18.720
So this is the basic,
link |
00:04:20.640
so you're saying the protein domain
link |
00:04:22.600
is the basic building block of the function
link |
00:04:25.640
that we think about proteins doing.
link |
00:04:28.320
So of course you can always talk
link |
00:04:30.200
about different building blocks.
link |
00:04:31.520
It's turtles all the way down.
link |
00:04:32.880
But there's a point where there is,
link |
00:04:36.280
at the point of the hierarchy
link |
00:04:37.680
where it's the most, the cleanest element block
link |
00:04:43.640
based on which you can put them together
link |
00:04:46.240
in different kinds of ways to form complex function.
link |
00:04:49.200
And you're saying protein domains,
link |
00:04:50.880
why is that not talked about as often in popular culture?
link |
00:04:55.160
Well, there are several perspectives on this.
link |
00:04:59.280
And one of course is the historical perspective, right?
link |
00:05:03.200
So historically scientists have been able
link |
00:05:07.800
to structurally resolved
link |
00:05:10.120
to obtain the 3D coordinates of a protein
link |
00:05:14.240
for smaller proteins.
link |
00:05:17.520
And smaller proteins tend to be a single domain protein.
link |
00:05:21.000
So we have a protein equal to a protein domain.
link |
00:05:24.000
And so because of that,
link |
00:05:26.040
the initial suspicion was that the proteins are,
link |
00:05:29.640
they have globular shapes
link |
00:05:31.680
and the more of smaller proteins you obtain structurally,
link |
00:05:36.840
the more you became convinced that that's the case.
link |
00:05:41.720
And only later when we started having
link |
00:05:47.920
alternative approaches.
link |
00:05:49.640
So the traditional ones are X ray crystallography
link |
00:05:55.920
and NMR spectroscopy.
link |
00:05:57.320
So this is sort of the two main techniques
link |
00:06:02.000
that give us the 3D coordinates.
link |
00:06:04.440
But nowadays there's huge breakthrough
link |
00:06:07.760
in cryo electron microscopy.
link |
00:06:10.480
So the more advanced methods that allow us
link |
00:06:13.840
to get into the 3D shapes of much larger molecules,
link |
00:06:21.560
molecular complexes,
link |
00:06:23.480
just to give you one of the common examples
link |
00:06:28.120
for this year, right?
link |
00:06:29.440
So the first experimental structure
link |
00:06:32.760
of a SARS COVID 2 protein
link |
00:06:35.920
was the cryo EM structure of the S protein.
link |
00:06:40.160
So the spike protein.
link |
00:06:41.960
And so it was solved very quickly.
link |
00:06:46.320
And the reason for that is the advancement
link |
00:06:49.480
of this technology is pretty spectacular.
link |
00:06:53.920
How many domains does the, is it more than one domain?
link |
00:06:57.480
Oh yes.
link |
00:06:58.320
Oh yes, I mean, so it's a very complex structure.
link |
00:07:01.320
And we, you know, on top of the complexity
link |
00:07:06.480
of a single protein, right?
link |
00:07:08.520
So this structure is actually is a complex, is a trimer.
link |
00:07:13.720
So it needs to form a trimer in order to function properly.
link |
00:07:17.640
What's a complex?
link |
00:07:18.720
So a complex is a glomeration of multiple proteins.
link |
00:07:22.880
And so we can have the same protein copied in multiple,
link |
00:07:29.280
you know, made up in multiple copies
link |
00:07:32.080
and forming something that we called a homo oligomer.
link |
00:07:36.160
Homo means the same, right?
link |
00:07:38.120
So in this case, so the spike protein is the,
link |
00:07:42.800
is an example of a homo tetram, homo trimer, sorry.
link |
00:07:46.720
So you need three copies of it?
link |
00:07:48.120
Three copies.
link |
00:07:48.960
In order to.
link |
00:07:50.040
Exactly.
link |
00:07:50.880
We have these three chains,
link |
00:07:52.760
the three molecular chains coupled together
link |
00:07:56.800
and performing the function.
link |
00:07:58.480
That's what, when you look at this protein from the top,
link |
00:08:02.380
you see a perfect triangle.
link |
00:08:03.920
Yeah.
link |
00:08:04.760
So, but other, you know,
link |
00:08:07.120
so other complexes are made up of, you know,
link |
00:08:10.640
different proteins.
link |
00:08:12.840
Some of them are completely different.
link |
00:08:15.400
Some of them are similar.
link |
00:08:16.920
The hemoglobin molecule, right?
link |
00:08:18.880
So it's actually, it's a protein complex.
link |
00:08:21.880
It's made of four basic subunits.
link |
00:08:25.760
Two of them are identical to each other.
link |
00:08:29.040
Two other identical to each other,
link |
00:08:30.800
but they are also similar to each other,
link |
00:08:32.820
which sort of gives us some ideas about the evolution
link |
00:08:36.960
of this, you know, of this molecule.
link |
00:08:40.640
And perhaps, so one of the hypothesis is that, you know,
link |
00:08:44.000
in the past, it was just a homo tetramer, right?
link |
00:08:48.280
So four identical copies,
link |
00:08:50.840
and then it became, you know, sort of modified,
link |
00:08:55.520
it became mutated over the time
link |
00:08:58.520
and became more specialized.
link |
00:09:00.160
Can we linger on the spike protein for a little bit?
link |
00:09:02.560
Is there something interesting
link |
00:09:04.940
or like beautiful you find about it?
link |
00:09:06.960
I mean, first of all,
link |
00:09:07.880
it's an incredibly challenging protein.
link |
00:09:10.960
And so we, as a part of our sort of research
link |
00:09:16.120
to understand the structural basis of this virus,
link |
00:09:20.200
to sort of decode, structurally decode,
link |
00:09:22.760
every single protein in its proteome,
link |
00:09:27.560
which, you know, we've been working on this spike protein.
link |
00:09:31.800
And one of the main challenges was that the cryoEM data
link |
00:09:36.800
allows us to reconstruct or to obtain the 3D coordinates
link |
00:09:44.640
of roughly two thirds of the protein.
link |
00:09:48.040
The rest of the one third of this protein,
link |
00:09:51.960
it's a part that is buried into the membrane of the virus
link |
00:09:58.400
and of the viral envelope.
link |
00:10:01.560
And it also has a lot of unstable structures around it.
link |
00:10:06.920
So it's chemically interacting somehow
link |
00:10:08.640
with whatever the hex is connecting to.
link |
00:10:10.160
Yeah, so people are still trying to understand.
link |
00:10:12.800
So the nature of, and the role of this one third,
link |
00:10:18.600
because the top part, you know, the primary function
link |
00:10:23.120
is to get attached to the ACE2 receptor, human receptor.
link |
00:10:28.120
There is also beautiful mechanics
link |
00:10:32.600
of how this thing happens, right?
link |
00:10:34.720
So because there are three different copies of this chains,
link |
00:10:39.800
you know, there are three different domains, right?
link |
00:10:43.480
So we're talking about domains.
link |
00:10:44.800
So this is the receptor binding domains, RBDs,
link |
00:10:47.840
that gets untangled and get ready to get attached
link |
00:10:53.840
to the receptor.
link |
00:10:55.400
And now they are not necessarily going in a sync mode.
link |
00:11:02.760
As a matter of fact.
link |
00:11:04.080
It's asynchronous.
link |
00:11:05.240
So yes, and this is where another level of complexity
link |
00:11:11.000
comes into play because right now what we see is,
link |
00:11:16.000
we typically see just one of the arms going out
link |
00:11:20.520
and getting ready to be attached to the ACE2 receptors.
link |
00:11:27.560
However, there was a recent mutation
link |
00:11:30.360
that people studied in that spike protein.
link |
00:11:35.080
And very recently, a group from UMass Medical School
link |
00:11:43.560
will happen to collaborate with groups.
link |
00:11:45.280
So this is a group of Jeremy Lubin
link |
00:11:47.240
and a number of other faculty.
link |
00:11:51.560
They actually solve the mutated structure of the spike.
link |
00:11:59.000
And they showed that actually, because of these mutations,
link |
00:12:03.000
you have more than one arms opening up.
link |
00:12:08.880
And so now, so the frequency of two arms going up
link |
00:12:13.880
increase quite drastically.
link |
00:12:17.200
Interesting.
link |
00:12:18.040
Does that change the dynamics somehow?
link |
00:12:20.120
It potentially can change the dynamics
link |
00:12:22.920
because now you have two possible opportunities
link |
00:12:27.280
to get attached to the ACE2 receptor.
link |
00:12:30.000
It's a very complex molecular process, mechanistic process.
link |
00:12:34.120
But the first step of this process is the attachment
link |
00:12:38.280
of this spike protein, of the spike trimer
link |
00:12:42.560
to the human ACE2 receptor.
link |
00:12:46.600
So this is a molecule that sits
link |
00:12:48.880
on the surface of the human cell.
link |
00:12:51.920
And that's essentially what initiates,
link |
00:12:54.720
what triggers the whole process of encapsulation.
link |
00:12:58.880
If this was dating, this would be the first date.
link |
00:13:01.440
So this is the...
link |
00:13:03.160
In a way.
link |
00:13:04.200
Yes.
link |
00:13:05.640
So is it possible to have the spike protein
link |
00:13:07.920
just like floating about on its own?
link |
00:13:10.640
Or does it need that interactability with the membrane?
link |
00:13:14.680
Yeah, so it needs to be attached,
link |
00:13:16.920
at least as far as I know.
link |
00:13:19.040
But when you get this thing attached on the surface,
link |
00:13:23.320
there is also a lot of dynamics
link |
00:13:25.120
on how it sits on the surface.
link |
00:13:28.200
So for example, there was a recent work in,
link |
00:13:32.200
again, where people use the cryolectron microscopy
link |
00:13:35.800
to get the first glimpse of the overall structure.
link |
00:13:38.960
It's a very low res, but you still get
link |
00:13:41.600
some interesting details about the surface,
link |
00:13:45.160
about what is happening inside,
link |
00:13:47.040
because we have literally no clue until recent work
link |
00:13:50.760
about how the capsid is organized.
link |
00:13:54.520
What's a capsid?
link |
00:13:55.360
So a capsid is essentially,
link |
00:13:56.720
it's the inner core of the viral particle
link |
00:14:01.040
where there is the RNA of the virus,
link |
00:14:05.000
and it's protected by another protein, N protein,
link |
00:14:10.280
that essentially acts as a shield.
link |
00:14:13.440
But now we are learning more and more,
link |
00:14:16.520
so it's actually, it's not just this shield,
link |
00:14:18.600
it potentially is used for the stability
link |
00:14:21.800
of the outer shell of the virus.
link |
00:14:25.040
So it's pretty complicated.
link |
00:14:27.840
And I mean, understanding all of this is really useful
link |
00:14:30.480
for trying to figure out like developing a vaccine
link |
00:14:33.000
or some kind of drug to attack,
link |
00:14:34.680
any aspects of this, right?
link |
00:14:36.040
So, I mean, there are many different implications to that.
link |
00:14:39.640
First of all, it's important to understand
link |
00:14:43.040
the virus itself, right?
link |
00:14:44.560
So in order to understand how it acts,
link |
00:14:51.560
what is the overall mechanistic process
link |
00:14:55.320
of this virus replication,
link |
00:14:57.320
of this virus proliferation to the cell, right?
link |
00:15:00.560
So that's one aspect.
link |
00:15:03.000
The other aspect is designing new treatments.
link |
00:15:06.480
So one of the possible treatments
link |
00:15:09.040
is designing nanoparticles.
link |
00:15:12.480
And so some nanoparticles that will resemble the viral shape
link |
00:15:17.200
that would have the spike integrated,
link |
00:15:19.520
and essentially would act as a competitor to the real virus
link |
00:15:23.680
by blocking the ACE2 receptors,
link |
00:15:26.680
and thus preventing the real virus entering the cell.
link |
00:15:30.400
Now, there are also, you know,
link |
00:15:32.920
there is a very interesting direction
link |
00:15:35.600
in looking at the membrane,
link |
00:15:38.320
at the envelope portion of the protein
link |
00:15:40.960
and attacking its M protein.
link |
00:15:44.880
So there are, you know, to give you a, you know,
link |
00:15:48.320
sort of a brief overview,
link |
00:15:50.120
there are four structural proteins.
link |
00:15:52.320
These are the proteins that made up
link |
00:15:54.560
a structure of the virus.
link |
00:15:58.160
So SPIKE, S protein that acts as a trimer,
link |
00:16:02.920
so it needs three copies.
link |
00:16:06.080
E, envelope protein that acts as a pantomime,
link |
00:16:09.720
so it needs five copies to act properly.
link |
00:16:13.400
M is a membrane protein, it forms dimers,
link |
00:16:18.600
and actually it forms beautiful lattice.
link |
00:16:20.560
And this is something that we've been studying
link |
00:16:22.480
and we are seeing it in simulations.
link |
00:16:24.520
It actually forms a very nice grid
link |
00:16:26.920
or, you know, threads, you know,
link |
00:16:30.600
of different dimers attached next to each other.
link |
00:16:33.600
Just a bunch of copies of each other,
link |
00:16:34.960
and they naturally, when you have a bunch of copies
link |
00:16:36.960
of each other, they form an interesting lattice.
link |
00:16:38.960
Exactly.
link |
00:16:39.800
And, you know, if you think about this, right?
link |
00:16:42.280
So this complex, you know, the viral shape
link |
00:16:48.160
needs to be organized somehow, self organized somehow, right?
link |
00:16:52.160
So it, you know, if it was a completely random process,
link |
00:16:56.160
you know, you probably wouldn't have the envelope shell
link |
00:17:02.080
of the ellipsoid shape, you know,
link |
00:17:03.920
you would have something, you know,
link |
00:17:05.880
pretty random, right, shape.
link |
00:17:07.600
So there is some, you know, regularity
link |
00:17:10.560
in how this, you know, how this M dimers
link |
00:17:16.720
get to attach to each other
link |
00:17:18.480
in a very specific directed way.
link |
00:17:20.520
Is that understood at all?
link |
00:17:23.080
It's not understood.
link |
00:17:24.280
We are now, we've been working in the past six months
link |
00:17:28.400
since, you know, we met, actually,
link |
00:17:30.160
this is where we started working on trying to understand
link |
00:17:33.400
the overall structure of the envelope
link |
00:17:36.280
and the key components that made up this, you know,
link |
00:17:40.640
structure.
link |
00:17:41.480
Wait, does the envelope also have the lattice structure
link |
00:17:43.240
or no?
link |
00:17:44.080
So the envelope is essentially is the outer shell
link |
00:17:47.360
of the viral particle.
link |
00:17:48.800
The N, the nucleocapsid protein,
link |
00:17:51.600
is something that is inside.
link |
00:17:53.960
Got it.
link |
00:17:54.800
But get that, the N is likely to interact with M.
link |
00:17:59.520
Does it go M and E?
link |
00:18:01.480
Like, where's the E and the M?
link |
00:18:02.880
So E, those different proteins,
link |
00:18:05.640
they occur in different copies on the viral particle.
link |
00:18:10.800
So E, this pentamer complex,
link |
00:18:13.960
we only have two or three, maybe, per each particle, okay?
link |
00:18:18.960
We have thousand or so of M dimers
link |
00:18:24.520
that essentially made up,
link |
00:18:26.600
that makes up the entire, you know, outer shell.
link |
00:18:30.920
So most of the outer shell is the M.
link |
00:18:33.680
M dimer.
link |
00:18:34.520
And the M protein.
link |
00:18:35.640
When you say particle, that's the virion,
link |
00:18:38.160
the virus, the individual virus.
link |
00:18:40.120
It's a single, yes.
link |
00:18:40.960
Single element of the virus, it's a single virus.
link |
00:18:43.640
Single virus, right.
link |
00:18:45.080
And we have about, you know, roughly 50 to 90 spike trimmers.
link |
00:18:50.840
Right?
link |
00:18:51.680
So when you, you know, when you show a...
link |
00:18:54.000
Per virus particle.
link |
00:18:55.040
Per virus particle.
link |
00:18:56.560
Sorry, what did you say, 50 to 90?
link |
00:18:58.680
50 to 90, right?
link |
00:19:00.680
So this is how this thing is organized.
link |
00:19:04.000
And so now, typically, right,
link |
00:19:06.400
so you see these, the antibodies that target,
link |
00:19:11.480
you know, spike protein,
link |
00:19:13.240
certain parts of the spike protein,
link |
00:19:15.200
but there could be some, also some treatments, right?
link |
00:19:17.960
So these are, you know, these are small molecules
link |
00:19:22.000
that bind strategic parts of these proteins,
link |
00:19:27.520
disrupting its function.
link |
00:19:29.680
So one of the promising directions,
link |
00:19:34.040
it's one of the newest directions,
link |
00:19:35.600
is actually targeting the M dimer of the protein.
link |
00:19:40.600
Targeting the proteins that make up this outer shell.
link |
00:19:44.120
Because if you're able to destroy the outer shell,
link |
00:19:47.640
you're essentially destroying the viral particle itself.
link |
00:19:52.160
So preventing it from, you know, functioning at all.
link |
00:19:56.720
So that's, you think is,
link |
00:19:59.160
from a sort of cyber security perspective,
link |
00:20:01.440
virus security perspective,
link |
00:20:02.960
that's the best attack vector?
link |
00:20:05.160
Is, or like, that's a promising attack vector?
link |
00:20:08.440
I would say, yeah.
link |
00:20:09.280
So, I mean, there's still tons of research needs to be,
link |
00:20:12.680
you know, to be done.
link |
00:20:14.000
But yes, I think, you know, so.
link |
00:20:16.560
There's more attack surface, I guess.
link |
00:20:18.880
More attack surface.
link |
00:20:19.880
But, you know, from our analysis,
link |
00:20:22.280
from other evolutionary analysis,
link |
00:20:24.200
this protein is evolutionarily more stable
link |
00:20:28.000
compared to the, say, to the spike protein.
link |
00:20:31.200
Oh, and stable means a more static target?
link |
00:20:35.520
Well, yeah, so it doesn't change.
link |
00:20:38.400
It doesn't evolve from the evolutionary perspective
link |
00:20:42.120
so drastically as, for example, the spike protein.
link |
00:20:46.000
There's a bunch of stuff in the news
link |
00:20:47.960
about mutations of the virus in the United Kingdom.
link |
00:20:51.400
I also saw in South Africa something.
link |
00:20:54.160
Maybe that was yesterday.
link |
00:20:56.360
You just kind of mentioned about stability and so on.
link |
00:21:00.200
Which aspects of this are mutatable
link |
00:21:02.800
and which aspects, if mutated, become more dangerous?
link |
00:21:07.600
And maybe even zooming out,
link |
00:21:09.280
what are your thoughts and knowledge and ideas
link |
00:21:12.080
about the way it's mutated,
link |
00:21:13.680
all the news that we've been hearing?
link |
00:21:15.360
Are you worried about it from a biological perspective?
link |
00:21:18.520
Are you worried about it from a human perspective?
link |
00:21:21.280
So, I mean, you know, mutations are sort of a general way
link |
00:21:26.320
for these viruses to evolve, right?
link |
00:21:28.640
So, it's, you know, it's essentially,
link |
00:21:32.680
this is the way they evolve.
link |
00:21:34.760
This is the way they were able to jump
link |
00:21:38.680
from one species to another.
link |
00:21:42.080
We also see some recent jumps.
link |
00:21:46.800
There were some incidents of this virus jumping
link |
00:21:50.000
from human to dogs.
link |
00:21:51.880
So, you know, there is some danger in those jumps
link |
00:21:55.880
because every time it jumps, it also mutates, right?
link |
00:21:59.520
So, when it jumps to the species
link |
00:22:04.400
and jumps back, right?
link |
00:22:06.160
So, it acquires some mutations
link |
00:22:08.320
that are sort of driven by the environment
link |
00:22:14.360
of a new host, right?
link |
00:22:16.360
And it's different from the human environment.
link |
00:22:19.280
And so, we don't know whether the mutations
link |
00:22:21.480
that are acquired in the new species
link |
00:22:24.920
are neutral with respect to the human host
link |
00:22:28.160
or maybe, you know, maybe damaging.
link |
00:22:32.080
Yeah, change is always scary, but so are you worried about,
link |
00:22:36.560
I mean, it seems like because the spread is,
link |
00:22:38.960
during winter now, seems to be exceptionally high
link |
00:22:43.560
and especially with a vaccine just around the corner
link |
00:22:46.760
already being actually deployed,
link |
00:22:49.160
is there some worry that this puts evolutionary pressure,
link |
00:22:53.000
selective pressure on the virus for it to mutate?
link |
00:22:59.000
Is that a source of worry?
link |
00:23:00.440
Well, I mean, there is always this thought
link |
00:23:03.440
in the scientist's mind, you know, what will happen, right?
link |
00:23:08.720
So, I know there've been discussions
link |
00:23:12.520
about sort of the arms race between the ability
link |
00:23:17.600
of the humanity to get vaccinated faster
link |
00:23:22.600
than the virus, you know, essentially, you know,
link |
00:23:27.600
it becomes, you know, resistant to the vaccine.
link |
00:23:34.200
I mean, I don't worry that much simply because,
link |
00:23:40.920
you know, there is not that much evidence to that.
link |
00:23:44.920
To aggressive mutation around the vaccine.
link |
00:23:47.440
Exactly, you know, obviously there are mutations
link |
00:23:49.960
around the vaccine, so the reason we get vaccinated
link |
00:23:56.080
every year against the seasonal mutations, right?
link |
00:24:01.280
But, you know, I think it's important to study it.
link |
00:24:06.120
No doubts, right?
link |
00:24:07.120
So, I think one of the, you know, to me,
link |
00:24:10.120
and again, I might be biased because, you know,
link |
00:24:14.120
we've been trying to do that as well,
link |
00:24:17.120
so, but one of the critical directions
link |
00:24:20.120
in understanding the virus is to understand its evolution
link |
00:24:23.920
in order to sort of understand the mechanisms,
link |
00:24:27.480
the key mechanisms that lead the virus to jump,
link |
00:24:30.960
you know, the Nordic viruses to jump from species,
link |
00:24:34.240
from species to another, that the mechanisms
link |
00:24:37.440
that lead the virus to become resistant to vaccines,
link |
00:24:42.480
also to treatments, right?
link |
00:24:44.680
And hopefully that knowledge will enable us
link |
00:24:48.520
to sort of forecast the evolutionary traces,
link |
00:24:52.520
the future evolutionary traces of this virus.
link |
00:24:55.160
I mean, what, from a biological perspective,
link |
00:24:58.080
this might be a dumb question,
link |
00:24:59.320
but is there parts of the virus that if souped up,
link |
00:25:05.080
like through mutation, could make it more effective
link |
00:25:09.080
at doing its job?
link |
00:25:09.920
We're talking about this specific coronavirus
link |
00:25:12.520
because we were talking about the different, like,
link |
00:25:14.880
the membrane, the M protein, the E protein,
link |
00:25:18.440
the N and the S, the spike, is there some?
link |
00:25:24.080
And there are 20 or so more in addition to that.
link |
00:25:27.880
But is that a dumb way to look at it?
link |
00:25:29.840
Like, which of these, if mutated,
link |
00:25:34.520
could have the greatest impact, potentially damaging impact,
link |
00:25:39.640
on the effectiveness of the virus?
link |
00:25:41.520
So it's actually, it's a very good question
link |
00:25:44.520
because, and the short answer is, we don't know yet.
link |
00:25:48.120
But of course there is capacity of this virus
link |
00:25:51.560
to become more efficient.
link |
00:25:53.560
The reason for that is, you know,
link |
00:25:56.680
so if you look at the virus, I mean, it's a machine, right?
link |
00:25:59.760
So it's a machine that does a lot of different functions,
link |
00:26:03.520
and many of these functions are sort of nearly perfect,
link |
00:26:06.520
but they're not perfect.
link |
00:26:07.840
And those mutations can have the greatest impact
link |
00:26:11.360
and make those functions more perfect.
link |
00:26:14.120
For example, the attachment to ACE2 receptor, right,
link |
00:26:18.240
of the spike, right?
link |
00:26:19.400
So, you know, has this virus reached the efficiency
link |
00:26:28.360
in which the attachment is carried out?
link |
00:26:31.560
Or there are some mutations that still to be discovered,
link |
00:26:36.080
right, that will make this attachment sort of stronger,
link |
00:26:41.920
or, you know, something more, in a way more efficient
link |
00:26:48.560
from the point of view of this virus functioning.
link |
00:26:51.880
That's sort of the obvious example.
link |
00:26:54.640
But if you look at each of these proteins,
link |
00:26:57.480
I mean, it's there for a reason,
link |
00:26:58.840
it performs certain function.
link |
00:27:00.760
And it could be that certain mutations will, you know,
link |
00:27:07.120
enhance this function.
link |
00:27:08.480
It could be that some mutations will make this function
link |
00:27:11.560
much less efficient, right?
link |
00:27:13.720
So that's also the case.
link |
00:27:16.200
Let's, since we're talking about the evolutionary history
link |
00:27:18.880
of a virus, let's zoom back out
link |
00:27:22.720
and look at the evolution of proteins.
link |
00:27:25.240
I glanced at this 2010 Nature paper
link |
00:27:29.960
on the quote, ongoing expansion of the protein universe.
link |
00:27:34.320
And then, you know, it kind of implies and talks about
link |
00:27:39.480
that proteins started with a common ancestor,
link |
00:27:42.520
which is, you know, kind of interesting.
link |
00:27:44.720
It's interesting to think about like,
link |
00:27:45.960
even just like the first organic thing
link |
00:27:49.840
that started life on Earth.
link |
00:27:51.840
And from that, there's now, you know, what is it?
link |
00:27:56.000
3.5 billion years later, there's now millions of proteins.
link |
00:27:59.880
And they're still evolving.
link |
00:28:01.320
And that's, you know, in part,
link |
00:28:02.960
one of the things that you're researching.
link |
00:28:05.000
Is there something interesting to you about the evolution
link |
00:28:09.200
of proteins from this initial ancestor to today?
link |
00:28:14.600
Is there something beautiful and insightful
link |
00:28:16.280
about this long story?
link |
00:28:18.120
So I think, you know, if I were to pick a single keyword
link |
00:28:24.120
about protein evolution, I would pick modularity,
link |
00:28:29.120
something that we talked about in the beginning.
link |
00:28:32.720
And that's the fact that the proteins are no longer
link |
00:28:36.960
considered as, you know, as a sequence of letters.
link |
00:28:41.280
There are hierarchical complexities
link |
00:28:45.880
in the way these proteins are organized.
link |
00:28:48.240
And these complexities are actually going
link |
00:28:51.720
beyond the protein sequence.
link |
00:28:53.920
It's actually going all the way back to the gene,
link |
00:28:57.720
to the nucleotide sequence.
link |
00:29:00.000
And so, you know, again, these protein domains,
link |
00:29:04.840
they are not only functional building blocks,
link |
00:29:07.840
they are also evolutionary building blocks.
link |
00:29:09.920
And so what we see in the sort of,
link |
00:29:12.560
in the later stages of evolution,
link |
00:29:15.120
I mean, once this stable structurally
link |
00:29:18.720
and functionally building blocks were discovered,
link |
00:29:22.040
they essentially, they stay, those domains stay as such.
link |
00:29:28.040
So that's why if you start comparing different proteins,
link |
00:29:31.560
you will see that many of them will have similar fragments.
link |
00:29:37.400
And those fragments will correspond to something
link |
00:29:39.640
that we call protein domain families.
link |
00:29:42.280
And so they are still different
link |
00:29:44.040
because you still have mutations and, you know,
link |
00:29:48.520
the, you know, different mutations are attributed to,
link |
00:29:53.200
to, you know, diversification of the function
link |
00:29:56.200
of this, you know, protein domains.
link |
00:29:58.840
However, you don't, you very rarely see, you know,
link |
00:30:03.520
the evolutionary events that would split
link |
00:30:07.840
this domain into fragments because,
link |
00:30:10.520
and it's, you know, once you have the domain split,
link |
00:30:17.240
you actually, you, you know,
link |
00:30:20.240
you can completely cancel out its function
link |
00:30:24.000
or at the very least you can reduce it.
link |
00:30:26.600
And that's not, you know, efficient from the point of view
link |
00:30:29.640
of the, you know, of the cell functioning.
link |
00:30:32.880
So, so the, the, the protein domain level
link |
00:30:37.240
is a very important one.
link |
00:30:39.160
Now, on top of that, right?
link |
00:30:42.040
So if you look at the proteins, right,
link |
00:30:44.120
so you have this structural units
link |
00:30:46.360
and they carry out the function,
link |
00:30:48.200
but then much less is known about things
link |
00:30:51.880
that connect this protein domains,
link |
00:30:54.400
something that we call linkers.
link |
00:30:56.400
And those linkers are completely flexible, you know,
link |
00:31:00.760
parts of the protein that nevertheless
link |
00:31:03.520
carry out a lot of function.
link |
00:31:06.360
So it's like little tails, little heads.
link |
00:31:08.040
So, so, so we do have tails.
link |
00:31:09.840
So they're called termini, C and N termini.
link |
00:31:12.320
So these are things right on the, on, on, on one
link |
00:31:17.160
and another ends of the protein sequence.
link |
00:31:20.040
So they are also very important.
link |
00:31:22.560
So they, they attributed to very specific interactions
link |
00:31:26.320
between the proteins.
link |
00:31:27.720
So.
link |
00:31:28.560
But you're referring to the links between domains.
link |
00:31:30.800
That connect the domains.
link |
00:31:32.600
And, you know, apart from the, just the,
link |
00:31:36.160
the simple perspective, if you have, you know,
link |
00:31:39.840
a very short domain, you have, sorry, a very short linker,
link |
00:31:43.720
you have two domains next to each other.
link |
00:31:45.880
They are forced to be next to each other.
link |
00:31:47.560
If you have a very long one,
link |
00:31:49.080
you have the domains that are extremely flexible
link |
00:31:52.040
and they carry out a lot of sort of
link |
00:31:54.320
spatial reorganization, right?
link |
00:31:56.880
That's awesome.
link |
00:31:58.120
But on top of that, right, just this linker itself,
link |
00:32:01.960
because it's so flexible, it actually can adapt
link |
00:32:05.760
to a lot of different shapes.
link |
00:32:07.480
And therefore it's a, it's a very good interactor
link |
00:32:11.080
when it comes to interaction between this protein
link |
00:32:14.000
and other protein, right?
link |
00:32:15.720
So these things also evolve, you know,
link |
00:32:18.920
and they in a way have different sort of laws of
link |
00:32:25.480
the driving laws that underlie the evolution
link |
00:32:30.600
because they no longer need to,
link |
00:32:33.400
to preserve certain structure, right?
link |
00:32:37.120
Unlike protein domains.
link |
00:32:38.880
And so on top of that,
link |
00:32:41.480
you have something that is even less studied.
link |
00:32:45.840
And this is something that attribute to,
link |
00:32:49.640
to the concept of alternative splicing.
link |
00:32:53.240
So alternative splicing.
link |
00:32:54.480
So it's a, it's a very cool concept.
link |
00:32:56.920
It's something that we've been fascinated about for,
link |
00:33:00.840
you know, over a decade in my lab
link |
00:33:03.520
and trying to do research with that.
link |
00:33:05.520
But so, you know, so typically, you know,
link |
00:33:08.080
a simplistic perspective is that one gene
link |
00:33:12.480
is equal one protein product, right?
link |
00:33:16.040
So you have a gene, you know,
link |
00:33:18.320
you transcribe it and translate it
link |
00:33:21.120
and it becomes a protein.
link |
00:33:24.600
In reality, when we talk about eukaryotes,
link |
00:33:28.360
especially sort of more recent eukaryotes
link |
00:33:32.320
that are very complex,
link |
00:33:33.800
the gene is no longer equal to one protein.
link |
00:33:40.200
It actually can produce multiple functionally,
link |
00:33:47.040
you know, active protein products.
link |
00:33:50.280
And each of them is, you know,
link |
00:33:52.720
is called an alternatively spliced product.
link |
00:33:57.040
The reason it happens is that if you look at the gene,
link |
00:34:00.960
it actually has, it has also blocks.
link |
00:34:05.560
And the blocks, some of which,
link |
00:34:08.320
and it's essentially, it goes like this.
link |
00:34:10.680
So we have a block that will later be translated.
link |
00:34:13.880
We call it exon.
link |
00:34:15.040
Then we'll have a block that is not translated, cut out.
link |
00:34:19.240
We call it intron.
link |
00:34:20.400
So we have exon, intron, exon, intron,
link |
00:34:22.840
et cetera, et cetera, et cetera, right?
link |
00:34:24.120
So sometimes you can have, you know,
link |
00:34:26.920
dozens of these exons and introns.
link |
00:34:29.880
So what happens is during the process
link |
00:34:32.680
when the gene is converted to RNA,
link |
00:34:37.320
we have things that are cut out,
link |
00:34:41.280
the introns that are cut out,
link |
00:34:43.240
and exons that now get assembled together.
link |
00:34:47.160
And sometimes we will throw out some of the exons
link |
00:34:52.320
and the remaining protein product will become
link |
00:34:54.600
still be the same.
link |
00:34:55.440
Different.
link |
00:34:56.280
Oh, different.
link |
00:34:57.120
So now you have fragments of the protein
link |
00:34:59.960
that no longer there.
link |
00:35:01.360
They were cut out with the introns.
link |
00:35:03.800
Sometimes you will essentially take one exon
link |
00:35:07.520
and replace it with another one, right?
link |
00:35:09.840
So there's some flexibility in this process.
link |
00:35:12.600
So that creates a whole new level of complexity.
link |
00:35:17.200
Cause now.
link |
00:35:18.040
Is this random though?
link |
00:35:18.880
Is it random?
link |
00:35:19.720
It's not random.
link |
00:35:20.840
We, and this is where I think now the appearance
link |
00:35:24.480
of this modern single cell
link |
00:35:27.360
and before that tissue level sequencing,
link |
00:35:31.240
next generation sequencing techniques such as RNA seed
link |
00:35:34.280
allows us to see that these are the events
link |
00:35:38.200
that often happen in response.
link |
00:35:41.040
It's a dynamic event that happens in response
link |
00:35:44.560
to disease or in response
link |
00:35:48.320
to certain developmental stage of a cell.
link |
00:35:51.800
And this is an incredibly complex layer
link |
00:35:56.840
that also undergoes, I mean,
link |
00:35:59.800
because it's at the gene level, right?
link |
00:36:01.560
So it undergoes certain evolution, right?
link |
00:36:05.440
And now we have this interplay
link |
00:36:08.680
between what is happening in the protein world
link |
00:36:12.720
and what is happening in the gene and RNA world.
link |
00:36:17.720
And for example, it's often that we see
link |
00:36:22.720
that the boundaries of this exons coincide
link |
00:36:28.200
with the boundaries of the protein domains, right?
link |
00:36:32.160
So there is this close interplay to that.
link |
00:36:36.520
It's not always, I mean, otherwise it would be too simple,
link |
00:36:39.280
right?
link |
00:36:40.120
But we do see the connection
link |
00:36:41.880
between those sort of machineries.
link |
00:36:45.000
And obviously the evolution will pick up this complexity
link |
00:36:49.760
and, you know.
link |
00:36:51.800
Select for whatever is successful,
link |
00:36:53.480
whatever is interesting function.
link |
00:36:55.040
We see that complexity in play
link |
00:36:57.560
and makes this question more complex, but more exciting.
link |
00:37:02.560
Small detour, I don't know if you think about this
link |
00:37:05.440
into the world of computer science.
link |
00:37:07.540
There's a Douglas Hostetter, I think,
link |
00:37:11.240
came up with the name of Quine,
link |
00:37:14.360
which are, I don't know if you're familiar
link |
00:37:16.180
with these things, but it's computer programs
link |
00:37:18.880
that have, I guess, exon and intron,
link |
00:37:22.160
and they copy, the whole purpose of the program
link |
00:37:24.800
is to copy itself.
link |
00:37:26.240
So it prints copies of itself,
link |
00:37:28.480
but can also carry information inside of it.
link |
00:37:30.980
So it's a very kind of crude, fun exercise of,
link |
00:37:36.420
can we sort of replicate these ideas from cells?
link |
00:37:40.000
Can we have a computer program that when you run it,
link |
00:37:42.940
just print itself, the entirety of itself,
link |
00:37:47.080
and does it in different programming languages and so on.
link |
00:37:50.040
I've been playing around and writing them.
link |
00:37:51.960
It's a kind of fun little exercise.
link |
00:37:53.720
You know, when I was a kid, so you know,
link |
00:37:55.720
it was essentially one of the sort of main stages
link |
00:38:02.860
in informatics Olympiads that you have to reach
link |
00:38:08.280
in order to be any so good,
link |
00:38:10.880
is you should be able to write a program
link |
00:38:14.400
that replicates itself.
link |
00:38:16.680
And so the task then becomes even sort of more complicated.
link |
00:38:20.920
So what is the shortest program?
link |
00:38:24.040
And of course, it's a function of a programming language,
link |
00:38:27.480
but yeah, I remember a long, long, long time ago
link |
00:38:30.940
when we tried to make it short and short
link |
00:38:34.840
and find the shortcut.
link |
00:38:36.520
There's actually on a stack exchange, there's a entire site
link |
00:38:41.720
called CodeGolf, I think,
link |
00:38:44.160
where the entirety is just the competition.
link |
00:38:46.560
People just come up with whatever task, I don't know,
link |
00:38:50.380
like write code that reports the weather today.
link |
00:38:54.680
And the competition is about whatever programming language,
link |
00:38:58.680
what is the shortest program?
link |
00:39:00.440
And it makes you actually, people should check it out
link |
00:39:02.280
because it makes you realize
link |
00:39:03.640
there's some weird programming languages out there.
link |
00:39:07.160
But just to dig on that a little deeper,
link |
00:39:12.640
do you think, in computer science,
link |
00:39:16.100
we don't often think about programs,
link |
00:39:19.280
just like the machine learning world now,
link |
00:39:22.280
that's still kind of basic programs.
link |
00:39:26.280
And then there's humans that replicate themselves, right?
link |
00:39:29.600
And there's these mutations and so on.
link |
00:39:31.440
Do you think we'll ever have a world
link |
00:39:34.520
where there's programs that kind of
link |
00:39:37.760
have an evolutionary process?
link |
00:39:40.640
So I'm not talking about evolutionary algorithms,
link |
00:39:42.640
but I'm talking about programs that kind of
link |
00:39:44.640
mate with each other and evolve
link |
00:39:46.480
and like on their own replicate themselves.
link |
00:39:49.600
So this is kind of the idea here is,
link |
00:39:54.640
that's how you can have a runaway thing.
link |
00:39:57.140
So we think about machine learning as a system
link |
00:39:59.240
that gets smarter and smarter and smarter and smarter.
link |
00:40:01.320
At least the machine learning systems of today are like,
link |
00:40:05.240
it's a program that you can like turn off,
link |
00:40:09.000
as opposed to throwing a bunch of little programs out there
link |
00:40:12.680
and letting them like multiply and mate
link |
00:40:15.560
and evolve and replicate.
link |
00:40:17.320
Do you ever think about that kind of world,
link |
00:40:20.400
when we jump from the biological systems
link |
00:40:23.360
that you're looking at to artificial ones?
link |
00:40:27.160
I mean, it's almost like you take the sort of the area
link |
00:40:32.440
of intelligent agents, right?
link |
00:40:34.360
Which are essentially the independent sort of codes
link |
00:40:38.640
that run and interact and exchange the information, right?
link |
00:40:42.480
So I don't see why not.
link |
00:40:45.120
I mean, it could be sort of a natural evolution
link |
00:40:48.760
in this area of computer science.
link |
00:40:52.880
I think it's kind of an interesting possibility.
link |
00:40:54.640
It's terrifying too,
link |
00:40:55.880
but I think it's a really powerful tool.
link |
00:40:58.360
Like to have like agents that, you know,
link |
00:41:00.680
we have social networks with millions of people
link |
00:41:02.800
and they interact.
link |
00:41:03.840
I think it's interesting to inject into that,
link |
00:41:05.720
was already injected into that bots, right?
link |
00:41:08.380
But those bots are pretty dumb.
link |
00:41:11.240
You know, they're probably pretty dumb algorithms.
link |
00:41:15.680
You know, it's interesting to think
link |
00:41:17.480
that there might be bots that evolve together with humans.
link |
00:41:20.440
And there's the sea of humans and robots
link |
00:41:23.960
that are operating first in the digital space.
link |
00:41:26.520
And then you can also think, I love the idea.
link |
00:41:29.080
Some people worked, I think at Harvard, at Penn,
link |
00:41:32.600
there's robotics labs that, you know,
link |
00:41:37.560
take as a fundamental task to build a robot
link |
00:41:40.600
that given extra resources can build another copy of itself,
link |
00:41:44.920
like in the physical space,
link |
00:41:46.560
which is super difficult to do, but super interesting.
link |
00:41:50.900
I remember there's like research on robots
link |
00:41:54.020
that can build a bridge.
link |
00:41:55.240
So they make a copy of themselves
link |
00:41:56.880
and they connect themselves
link |
00:41:57.960
and the sort of like self building bridge
link |
00:42:00.560
based on building blocks.
link |
00:42:02.380
You can imagine like a building that self assembles.
link |
00:42:05.640
So it's basically self assembling structures
link |
00:42:07.560
from robotic parts.
link |
00:42:10.620
But it's interesting to, within that robot,
link |
00:42:13.880
add the ability to mutate
link |
00:42:15.920
and do all the interesting like little things
link |
00:42:21.320
that you're referring to in evolution
link |
00:42:23.200
to go from a single origin protein building block
link |
00:42:26.320
to like this weird complex.
link |
00:42:28.920
And if you think about this, I mean, you know,
link |
00:42:30.960
the bits and pieces are there, you know.
link |
00:42:34.600
So you mentioned the evolution algorithm, right?
link |
00:42:37.040
You know, so this is sort of,
link |
00:42:38.520
and maybe sort of the goal is in a way different, right?
link |
00:42:43.520
So the goal is to, you know, to essentially,
link |
00:42:46.720
to optimize your search, right?
link |
00:42:50.080
So, but sort of the ideas are there.
link |
00:42:53.060
So people recognize that, you know,
link |
00:42:55.080
that the recombination events lead to global changes
link |
00:43:01.160
in the search trajectories, the mutations event
link |
00:43:04.440
is a more refined, you know, step in the search.
link |
00:43:09.080
Then you have, you know, other sort of
link |
00:43:14.220
nature inspired algorithm, right?
link |
00:43:16.480
So one of the reasons that, you know,
link |
00:43:19.580
I think it's one of the funnest one
link |
00:43:21.940
is the slime based algorithm, right?
link |
00:43:24.820
So it's, I think the first was introduced
link |
00:43:28.220
by the Japanese group,
link |
00:43:30.220
where it was able to solve some pre complex problems.
link |
00:43:35.220
So that's, and then I think there are still a lot of things
link |
00:43:43.340
we've yet to, you know, borrow from the nature, right?
link |
00:43:48.960
So there are a lot of sort of ideas
link |
00:43:52.020
that nature, you know, gets to offer us that, you know,
link |
00:43:56.740
it's up to us to grab it and to, you know,
link |
00:44:01.020
get the best use of it.
link |
00:44:02.140
Including neural networks, you know, we have a very crude
link |
00:44:06.380
inspiration from nature on neural networks.
link |
00:44:08.300
Maybe there's other inspirations to be discovered
link |
00:44:10.920
in the brain or other aspects of the various systems,
link |
00:44:16.280
even like the immune system, the way it interplays.
link |
00:44:20.140
I recently started to understand that the,
link |
00:44:22.580
like the immune system has something to do
link |
00:44:24.360
with the way the brain operates.
link |
00:44:26.020
Like there's multiple things going on in there,
link |
00:44:28.360
which all of which are not modeled
link |
00:44:30.500
in artificial neural networks.
link |
00:44:32.140
And maybe if you throw a little bit of that biological spice
link |
00:44:35.380
in there, you'll come up with something, something cool.
link |
00:44:39.020
I'm not sure if you're familiar with the Drake equation
link |
00:44:43.740
that estimate, I just did a video on it yesterday
link |
00:44:46.740
because I wanted to give my own estimate of it.
link |
00:44:49.280
It's an equation that combines a bunch of factors
link |
00:44:52.340
to estimate how many alien civilizations are in the galaxy.
link |
00:44:56.980
I've heard about it, yes.
link |
00:44:58.500
So one of the interesting parameters, you know,
link |
00:45:01.340
it's like how many stars are born every year,
link |
00:45:05.980
how many planets are on average per star for this,
link |
00:45:11.700
how many habitable planets are there.
link |
00:45:14.260
And then the one that starts being really interesting
link |
00:45:18.660
is the probability that life emerges on a habitable planet.
link |
00:45:24.740
So like, I don't know if you think about,
link |
00:45:27.900
you certainly think a lot about evolution,
link |
00:45:29.720
but do you think about the thing
link |
00:45:31.060
which evolution doesn't describe,
link |
00:45:32.520
which is like the beginning of evolution, the origin of life.
link |
00:45:36.620
I think I put the probability of life developing
link |
00:45:39.320
in a habitable planet at 1%.
link |
00:45:41.800
This is very scientifically rigorous.
link |
00:45:44.440
Okay, well, first at a high level for the Drake equation,
link |
00:45:48.740
what would you put that percent at on earth?
link |
00:45:51.660
And in general, do you have something,
link |
00:45:55.100
do you have thoughts about how life might've started,
link |
00:45:58.220
you know, like the proteins being the first kind of,
link |
00:46:01.100
one of the early jumping points?
link |
00:46:02.940
Yeah, so I think back in 2018,
link |
00:46:07.500
there was a very exciting paper published in Nature
link |
00:46:10.460
where they found one of the simplest amino acids,
link |
00:46:18.320
glycine, in a comet dust.
link |
00:46:23.320
So this is, and I apologize if I don't pronounce,
link |
00:46:29.440
it's a Russian named comet,
link |
00:46:31.840
it's I think Chugryumov Gerasimenko.
link |
00:46:34.760
This is the comet where, and there was this mission
link |
00:46:40.000
to get close to this comet and get the stardust
link |
00:46:46.320
from its tail.
link |
00:46:48.160
And when scientists analyzed it,
link |
00:46:50.620
they actually found traces of, you know, of glycine,
link |
00:46:56.640
which, you know, makes up, you know,
link |
00:46:59.400
it's one of the basic, one of the 20 basic amino acids
link |
00:47:04.180
that makes up proteins, right?
link |
00:47:06.400
So that was kind of very exciting, right?
link |
00:47:10.960
But, you know, the question is very interesting, right?
link |
00:47:14.220
So what, you know, if there is some alien life,
link |
00:47:18.540
is it gonna be made of proteins, right?
link |
00:47:22.940
Or maybe RNAs, right?
link |
00:47:24.340
So we see that, you know, the RNA viruses are certainly,
link |
00:47:29.120
you know, very well established sort of, you know,
link |
00:47:35.020
group of molecular machines, right?
link |
00:47:37.820
So, yeah, it's a very interesting question.
link |
00:47:42.140
What probability would you put?
link |
00:47:43.580
Like, how hard is this job?
link |
00:47:45.260
Like, how unlikely just on Earth do you think
link |
00:47:48.740
this whole thing is that we got going?
link |
00:47:51.600
Like, are we really lucky or is it inevitable?
link |
00:47:54.620
Like, what's your sense when you sit back
link |
00:47:56.240
and think about life on Earth?
link |
00:47:58.820
Is it higher or lower than 1%?
link |
00:48:00.980
Well, because 1% is pretty low, but it still is like,
link |
00:48:03.420
damn, that's a pretty good chance.
link |
00:48:05.060
Yes, it's a pretty good chance.
link |
00:48:06.600
I mean, I would, personally, but again, you know,
link |
00:48:10.580
I'm, you know, probably not the best person
link |
00:48:14.140
to do such estimations, but I would, you know,
link |
00:48:19.340
intuitively, I would probably put it lower.
link |
00:48:23.100
But still, I mean, you know, given.
link |
00:48:24.820
So we're really lucky here on Earth.
link |
00:48:27.980
I mean.
link |
00:48:28.820
Or the conditions are really good.
link |
00:48:30.500
It's, you know, I think that there was,
link |
00:48:32.340
everything was right in a way, right?
link |
00:48:35.460
So we still, it's not, the conditions were not like ideal
link |
00:48:39.720
if you try to look at, you know, what was, you know,
link |
00:48:44.060
several billions years ago when the life emerged.
link |
00:48:48.340
So there is something called the Rare Earth Hypothesis
link |
00:48:52.020
that, you know, in counter to the Drake Equation says
link |
00:48:55.740
that the, you know, the conditions of Earth,
link |
00:49:00.240
if you actually were to describe Earth,
link |
00:49:03.260
it's quite a special place.
link |
00:49:05.700
So special it might be unique in our galaxy
link |
00:49:09.120
and potentially, you know, close to unique
link |
00:49:11.780
in the entire universe.
link |
00:49:12.860
Like it's very difficult to reconstruct
link |
00:49:14.740
those same conditions.
link |
00:49:16.380
And what the Rare Earth Hypothesis argues
link |
00:49:19.580
is all those different conditions are essential for life.
link |
00:49:23.100
And so that's sort of the counter, you know,
link |
00:49:26.180
like all the things we, you know,
link |
00:49:29.220
thinking that Earth is pretty average.
link |
00:49:31.740
I mean, I can't really, I'm trying to remember
link |
00:49:34.340
to go through all of them, but just the fact
link |
00:49:36.140
that it is shielded from a lot of asteroids,
link |
00:49:41.140
the, obviously the distance to the sun,
link |
00:49:43.820
but also the fact that it's like a perfect balance
link |
00:49:48.220
between the amount of water and land
link |
00:49:52.180
and all those kinds of things.
link |
00:49:53.660
I don't know, there's a bunch of different factors
link |
00:49:55.180
that I don't remember, there's a long list.
link |
00:49:57.520
But it's fascinating to think about if in order
link |
00:50:01.260
for something like proteins and then DNA and RNA
link |
00:50:05.020
to emerge, you need, and basic living organisms,
link |
00:50:10.020
you need to be very close to an Earth like planet,
link |
00:50:14.960
which will be sad or exciting, I don't know which.
link |
00:50:19.740
If you ask me, I, you know, in a way I put a parallel
link |
00:50:23.220
between, you know, between our own research.
link |
00:50:28.380
And I mean, from the intuitive perspective,
link |
00:50:33.820
you know, you have those two extremes
link |
00:50:36.700
and the reality is never very rarely falls
link |
00:50:40.820
into the extremes.
link |
00:50:41.900
It's always the optimus always reached somewhere in between.
link |
00:50:46.500
So, and that's what I tend to think.
link |
00:50:50.060
I think that, you know, we're probably somewhere in between.
link |
00:50:54.020
So they were not unique, unique, but again,
link |
00:50:58.220
the chances are, you know, reasonably small.
link |
00:51:01.940
The problem is we don't know the other extreme
link |
00:51:04.180
is like, I tend to think that we don't actually understand
link |
00:51:08.060
the basic mechanisms of like what this is all originated
link |
00:51:11.900
from, like, it seems like we think of life
link |
00:51:15.060
as this distinct thing, maybe intelligence
link |
00:51:17.100
is a distinct thing, maybe the physics that,
link |
00:51:20.380
from which planets and suns are born is a distinct thing.
link |
00:51:24.380
But that could be a very, it's like the Stephen Wolfram
link |
00:51:27.140
thing, it's like the, from simple rules emerges
link |
00:51:29.420
greater and greater complexity.
link |
00:51:31.020
So, you know, I tend to believe that just life finds a way.
link |
00:51:36.100
Like, we don't know the extreme of how common life is
link |
00:51:39.540
because it could be life is like everywhere.
link |
00:51:44.980
Like, so everywhere that it's almost like laughable,
link |
00:51:49.420
like that we're such idiots to think who are you?
link |
00:51:52.140
Like, it's like ridiculous to even like think,
link |
00:51:56.260
it's like ants thinking that their little colony
link |
00:51:59.460
is the unique thing and everything else doesn't exist.
link |
00:52:03.220
I mean, it's also very possible that that's the extreme
link |
00:52:07.540
and we're just not able to maybe comprehend
link |
00:52:09.900
the nature of that life.
link |
00:52:12.860
Just to stick on alien life for just a brief moment more,
link |
00:52:16.580
there is some signs of life on Venus in gaseous form.
link |
00:52:22.260
There's hope for life on Mars, probably extinct.
link |
00:52:27.260
We're not talking about intelligent life.
link |
00:52:29.220
Although that has been in the news recently.
link |
00:52:32.340
We're talking about basic like, you know, bacteria.
link |
00:52:36.300
Yeah, and then also, I guess, there's a couple moons.
link |
00:52:40.820
Europe.
link |
00:52:41.660
Yeah, Europa, which is Jupiter's moon.
link |
00:52:45.100
I think there's another one.
link |
00:52:46.580
Are you, is that exciting or is it terrifying to you
link |
00:52:50.380
that we might find life?
link |
00:52:52.140
Do you hope we find life?
link |
00:52:53.580
I certainly do hope that we find life.
link |
00:52:56.020
I mean, it was very exciting to hear about this news
link |
00:53:05.380
about the possible life on Venus.
link |
00:53:09.260
It'd be nice to have hard evidence of something with,
link |
00:53:12.540
which is what the hope is for Mars and Europa.
link |
00:53:17.140
But do you think those organisms
link |
00:53:18.420
will be similar biologically
link |
00:53:20.780
or would they even be sort of carbon based
link |
00:53:23.940
if we do find them?
link |
00:53:25.740
I would say they would be carbon based.
link |
00:53:28.940
How similar, it's a big question, right?
link |
00:53:31.820
So it's the moment we discover things outside Earth, right?
link |
00:53:39.540
Even if it's a tiny little single cell.
link |
00:53:43.260
I mean, there is so much.
link |
00:53:45.380
Just imagine that, that would be so.
link |
00:53:47.660
I think that that would be another turning point
link |
00:53:50.700
for the science, you know?
link |
00:53:52.540
Especially if it's different in some very new way.
link |
00:53:56.220
That's exciting.
link |
00:53:57.060
Because that says, that's a definitive statement,
link |
00:53:59.700
not a definitive, but a pretty strong statement
link |
00:54:01.780
that life is everywhere in the universe.
link |
00:54:05.420
To me at least, that's really exciting.
link |
00:54:08.940
You brought up Joshua Lederberg in an offline conversation.
link |
00:54:13.460
I think I'd love to talk to you about Alpha Fold
link |
00:54:15.780
and this might be an interesting way
link |
00:54:17.220
to enter that conversation because,
link |
00:54:19.580
so he won the 1958 Nobel Prize in Physiology and Medicine
link |
00:54:24.500
for discovering that bacteria can mate and exchange genes.
link |
00:54:29.020
But he also did a ton of other stuff,
link |
00:54:32.220
like we mentioned, helping NASA find life on Mars
link |
00:54:37.740
and the...
link |
00:54:40.980
Dendro. Dendro.
link |
00:54:42.580
The chemical expert system.
link |
00:54:45.260
Expert systems, remember those?
link |
00:54:46.860
What do you find interesting about this guy
link |
00:54:51.380
and his ideas about artificial intelligence in general?
link |
00:54:54.980
So I have a kind of personal story to share.
link |
00:55:00.180
So I started my PhD in Canada back in 2000.
link |
00:55:05.140
And so essentially my PhD was,
link |
00:55:07.740
so we were developing sort of a new language
link |
00:55:10.100
for symbolic machine learning.
link |
00:55:12.540
So it's different from the feature based machine learning.
link |
00:55:15.100
And one of the sort of cleanest applications
link |
00:55:19.820
of this approach, of this formalism
link |
00:55:23.980
was to cheminformatics and computer aided drug design.
link |
00:55:28.820
So essentially we were, as a part of my research,
link |
00:55:33.820
I developed a system that essentially looked
link |
00:55:37.380
at chemical compounds of say the same therapeutic category,
link |
00:55:42.380
you know, male hormones, right?
link |
00:55:45.540
And try to figure out the structural fragments
link |
00:55:51.740
that are the structural building blocks
link |
00:55:54.420
that are important that define this class
link |
00:55:58.140
versus structural building blocks
link |
00:55:59.780
that are there just because, you know,
link |
00:56:02.700
to complete the structure.
link |
00:56:04.260
But they are not essentially the ones
link |
00:56:06.060
that make up the chemical, the key chemical properties
link |
00:56:10.020
of this therapeutic category.
link |
00:56:12.780
And, you know, for me, it was something new.
link |
00:56:16.900
I was trained as an applied mathematicians, you know,
link |
00:56:20.580
as with some machine learning background,
link |
00:56:22.980
but, you know, computer aided drug design
link |
00:56:25.060
was a completely new territory.
link |
00:56:27.660
So because of that, I often find myself
link |
00:56:31.500
asking lots of questions on one of these
link |
00:56:34.340
sort of central forums.
link |
00:56:36.940
Back then, there were no Facebooks or stuff like that.
link |
00:56:40.420
There was a forum, you know, it's a forum.
link |
00:56:43.620
It's essentially, it's like a bulletin board.
link |
00:56:45.780
Yeah.
link |
00:56:46.620
On the internet.
link |
00:56:47.460
Yeah, so you essentially, you have a bunch of people
link |
00:56:50.300
and you post a question and you get, you know,
link |
00:56:52.900
an answer from, you know, different people.
link |
00:56:55.300
And back then, just like one of the most popular forums
link |
00:56:59.300
was CCL, I think Computational Chemistry Library,
link |
00:57:04.300
not library, but something like that,
link |
00:57:07.100
but CCL, that was the forum.
link |
00:57:09.820
And there, I, you know, I...
link |
00:57:12.780
Asked a lot of dumb questions.
link |
00:57:14.060
Yes, I asked questions.
link |
00:57:15.500
Also shared some, you know, some information
link |
00:57:19.340
about how formal it is and how we do
link |
00:57:21.460
and whether whatever we do makes sense.
link |
00:57:25.100
And so, you know, and I remember that one of these posts,
link |
00:57:29.180
I mean, I still remember, you know,
link |
00:57:31.420
I would call it desperately looking
link |
00:57:35.340
for a chemist advice, something like that, right?
link |
00:57:40.740
And so I post my question, I explained, you know,
link |
00:57:43.980
how formalism is, what it does
link |
00:57:49.220
and what kind of applications I'm planning to do.
link |
00:57:53.180
And, you know, and it was, you know,
link |
00:57:55.020
in the middle of the night and I went back to bed.
link |
00:57:59.660
And next morning, have a phone call from my advisor
link |
00:58:04.780
who also looked at this forum.
link |
00:58:06.900
It's like, you won't believe who replied to you.
link |
00:58:11.020
And it's like, who?
link |
00:58:13.900
And he said, well, you know, there is a message
link |
00:58:16.300
to you from Joshua Lederberg.
link |
00:58:19.140
And my reaction was like, who is Joshua Lederberg?
link |
00:58:22.660
Your advisor hung up. So, and essentially, you know,
link |
00:58:29.660
Joshua wrote me that we had conceptually similar ideas
link |
00:58:34.060
in the dendrial project.
link |
00:58:36.660
You may wanna look it up.
link |
00:58:39.300
And we should also, sorry, and it's a side comment,
link |
00:58:42.620
say that even though he won the Nobel Prize
link |
00:58:45.940
at a really young age, in 58, but so he was,
link |
00:58:49.820
I think he was what, 33.
link |
00:58:52.860
It's just crazy.
link |
00:58:53.980
So anyway, so that's, so hence in the 90s,
link |
00:58:57.660
responding to young whippersnappers on the CCL forum.
link |
00:59:02.100
Okay.
link |
00:59:02.940
And so back then he was already very senior.
link |
00:59:05.820
I mean, he unfortunately passed away back in 2008,
link |
00:59:09.580
but, you know, back in 2001, he was, I mean,
link |
00:59:12.580
he was a professor emeritus at Rockefeller University.
link |
00:59:15.980
And, you know, that was actually, believe it or not,
link |
00:59:18.460
one of the reasons I decided to join, you know,
link |
00:59:25.460
as a postdoc, the group of Andre Salle,
link |
00:59:28.140
who was at Rockefeller University,
link |
00:59:30.820
with the hope that, you know, that I could actually,
link |
00:59:33.460
you know, have a chance to meet Joshua in person.
link |
00:59:38.060
And I met him very briefly, right?
link |
00:59:42.140
Just because he was walking, you know,
link |
00:59:45.380
there's a little bridge that connects the,
link |
00:59:47.860
sort of the research campus with the,
link |
00:59:51.940
with the sort of skyscraper that Rockefeller owns,
link |
00:59:55.500
the where, you know, postdocs and faculty
link |
00:59:58.780
and graduate students live.
link |
01:00:00.260
And so I met him, you know,
link |
01:00:02.460
and had a very short conversation, you know.
link |
01:00:06.340
But so I started, you know, reading about Dendral
link |
01:00:10.380
and I was amazed, you know, it's,
link |
01:00:12.660
we're talking about 1960, right?
link |
01:00:16.100
The ideas were so profound.
link |
01:00:19.300
Well, what's the fun about the ideas of it?
link |
01:00:21.140
The reason to make this is even crazier.
link |
01:00:25.020
So, Lederberg wanted to make a system
link |
01:00:29.860
that would help him study the extraterrestrial molecules,
link |
01:00:38.220
right?
link |
01:00:39.060
So, the idea was that, you know,
link |
01:00:40.980
the way you study the extraterrestrial molecules
link |
01:00:43.420
is you do the mass spec analysis, right?
link |
01:00:46.780
And so the mass spec gives you sort of bits,
link |
01:00:49.700
numbers about essentially gives you the ideas
link |
01:00:52.620
about the possible fragments or, you know,
link |
01:00:55.900
atoms, you know, and maybe a little fragments,
link |
01:00:59.820
pieces of this molecule that make up the molecule, right?
link |
01:01:03.620
So now you need to sort of,
link |
01:01:06.060
to decompose this information
link |
01:01:09.180
and to figure out what was the hole
link |
01:01:12.460
before it became fragments, bits and pieces, right?
link |
01:01:17.660
So, in order to make this, you know,
link |
01:01:20.860
to have this tool, the idea of Lederberg
link |
01:01:25.660
was to connect chemistry, computer science,
link |
01:01:32.060
and to design this so called expert system
link |
01:01:36.100
that looks, that takes into account,
link |
01:01:38.180
that takes as an input the mass spec data,
link |
01:01:42.180
the possible database of possible molecules
link |
01:01:47.980
and essentially try to sort of induce the molecule
link |
01:01:52.660
that would correspond to this spectra
link |
01:01:55.580
or, you know, essentially what this project ended up being
link |
01:02:03.060
was that, you know, it would provide a list of candidates
link |
01:02:07.100
that then a chemist would look at and make final decision.
link |
01:02:11.940
So.
link |
01:02:12.780
But the original idea, I suppose,
link |
01:02:13.980
is to solve the entirety of this problem automatically.
link |
01:02:16.820
Yes, yes.
link |
01:02:17.660
So he, you know, so he,
link |
01:02:21.940
back then he approached. 60s.
link |
01:02:25.180
Yes, believe that, it's amazing.
link |
01:02:28.940
I mean, it still blows my mind, you know, that it's,
link |
01:02:32.220
that's, and this was essentially the origin
link |
01:02:37.420
of the modern bioinformatics, cheminformatics,
link |
01:02:41.100
you know, back in 60s.
link |
01:02:42.780
So that's, you know, every time you deal with projects
link |
01:02:48.540
like this, with the, you know, research like this,
link |
01:02:51.340
you just, you know, so the power of the, you know,
link |
01:02:56.340
intelligence of this people is just, you know, overwhelming.
link |
01:03:01.740
Do you think about expert systems, is there,
link |
01:03:05.420
and why they kind of didn't become successful,
link |
01:03:10.380
especially in the space of bioinformatics,
link |
01:03:12.500
where it does seem like there is a lot of expertise
link |
01:03:15.380
in humans, and, you know, it's possible to see
link |
01:03:20.060
that a system like this could be made very useful.
link |
01:03:23.580
Right.
link |
01:03:24.420
And be built up.
link |
01:03:25.260
So it's actually, it's a great question,
link |
01:03:26.900
and this is something, so, you know, so, you know,
link |
01:03:30.460
at my university, I teach artificial intelligence,
link |
01:03:33.900
and, you know, we start, my first two lectures
link |
01:03:37.940
are on the history of AI.
link |
01:03:40.140
And there we, you know, we try to, you know,
link |
01:03:45.300
go through the main stages of AI.
link |
01:03:48.180
And so, you know, the question of why expert systems failed
link |
01:03:54.260
or became obsolete, it's actually a very interesting one.
link |
01:03:58.540
And there are, you know, if you try to read the, you know,
link |
01:04:01.980
the historical perspectives,
link |
01:04:03.340
there are actually two lines of thoughts.
link |
01:04:05.540
One is that they were essentially
link |
01:04:11.940
not up to the expectations.
link |
01:04:14.820
And so therefore they were replaced, you know,
link |
01:04:18.020
by other things, right?
link |
01:04:21.180
The other one was that completely opposite one,
link |
01:04:25.340
that they were too good.
link |
01:04:28.140
And as a result, they essentially became
link |
01:04:31.900
sort of a household name,
link |
01:04:33.180
and then essentially they got transformed.
link |
01:04:37.100
I mean, in both cases, sort of the outcome was the same.
link |
01:04:40.700
They evolved into something, right?
link |
01:04:43.740
And that's what I, you know, if I look at this, right?
link |
01:04:47.700
So the modern machine learning, right?
link |
01:04:50.180
So.
link |
01:04:51.020
So there's echoes in the modern machine learning.
link |
01:04:53.260
I think so, I think so, because, you know,
link |
01:04:55.340
if you think about this, you know, and how we design,
link |
01:04:59.140
you know, the most successful algorithms,
link |
01:05:02.500
including AlphaFold, right?
link |
01:05:04.140
You built in the knowledge about the domain
link |
01:05:08.100
that you study, right?
link |
01:05:09.940
So you built in your expertise.
link |
01:05:12.900
So speaking of AlphaFold,
link |
01:05:14.460
so DeepMind's AlphaFold 2 recently was announced
link |
01:05:18.260
to have, quote unquote, solved protein folding.
link |
01:05:21.980
But how exciting is this to you?
link |
01:05:24.220
It seems to be one of the,
link |
01:05:27.060
one of the exciting things that have happened in 2020.
link |
01:05:29.660
It's an incredible accomplishment from the looks of it.
link |
01:05:32.340
What part of it is amazing to you?
link |
01:05:33.860
What part would you say is over hype
link |
01:05:36.300
or maybe misunderstood?
link |
01:05:39.020
It's definitely a very exciting achievement.
link |
01:05:41.940
To give you a little bit of perspective, right?
link |
01:05:43.820
So in bioinformatics, we have several competitions.
link |
01:05:50.020
And so the way, you know, you often hear
link |
01:05:53.940
how those competitions have been explained
link |
01:05:56.220
to sort of to non bioinformaticians is that, you know,
link |
01:05:59.820
they call it bioinformatics Olympic games.
link |
01:06:01.900
And there are several disciplines, right?
link |
01:06:03.620
So the historically one of the first one
link |
01:06:07.020
was the discipline in predicting the protein structure,
link |
01:06:10.300
predicting the 3D coordinates of the protein.
link |
01:06:12.580
But there are some others.
link |
01:06:13.580
So the predicting protein functions,
link |
01:06:16.740
predicting effects of mutations on protein functions,
link |
01:06:21.460
then predicting protein, protein interactions.
link |
01:06:24.900
So the original one was CASP
link |
01:06:28.100
or a critical assessment of a protein structure.
link |
01:06:32.020
And the, you know, typically what happens
link |
01:06:40.020
during this competitions is, you know, scientists,
link |
01:06:43.980
experimental scientists solve the structures,
link |
01:06:48.380
but don't put them into the protein data bank,
link |
01:06:51.700
which is the centralized database
link |
01:06:54.700
that contains all the 3D coordinates.
link |
01:06:57.260
Instead, they hold it and release protein sequences.
link |
01:07:02.340
And now the challenge of the community
link |
01:07:05.420
is to predict the 3D structures of this proteins
link |
01:07:10.180
and then use the experimental results structures
link |
01:07:12.940
to assess which one is the closest one, right?
link |
01:07:16.620
And this competition, by the way,
link |
01:07:17.740
just a bunch of different tangents.
link |
01:07:19.540
And maybe you can also say, what is protein folding?
link |
01:07:22.860
Then this competition, CASP competition
link |
01:07:25.020
has become the gold standard.
link |
01:07:27.420
And that's what was used to say
link |
01:07:29.500
that protein folding was solved.
link |
01:07:32.420
So just to add a little, just a bunch.
link |
01:07:35.300
So if you could, whenever you say stuff,
link |
01:07:37.700
maybe throw in some of the basics
link |
01:07:39.380
for the folks that might be outside of the field.
link |
01:07:41.580
Anyway, sorry.
link |
01:07:42.740
So, yeah, so, you know, so the reason it's, you know,
link |
01:07:45.900
it's relevant to our understanding of protein folding
link |
01:07:50.260
is because, you know, we've yet to learn
link |
01:07:54.180
how the folding mechanistically works, right?
link |
01:07:58.140
So there are different hypothesis,
link |
01:08:00.740
what happens to this fold?
link |
01:08:02.780
For example, there is a hypothesis that the folding happens
link |
01:08:07.620
by, you know, also in the modular fashion, right?
link |
01:08:12.660
So that, you know, we have protein domains
link |
01:08:16.220
that get folded independently
link |
01:08:17.940
because their structure is stable.
link |
01:08:19.700
And then the whole protein structure gets formed.
link |
01:08:23.380
But, you know, within those domains,
link |
01:08:25.380
we also have a so called secondary structure,
link |
01:08:27.460
the small alpha helices, beta schists.
link |
01:08:29.820
So these are, you know, elements that are structurally stable.
link |
01:08:34.340
And so, and the question is, you know,
link |
01:08:37.820
when do they get formed?
link |
01:08:40.340
Because some of the secondary structure elements,
link |
01:08:42.580
you have to have, you know, a fragment in the beginning
link |
01:08:46.500
and say the fragment in the middle, right?
link |
01:08:49.420
So you cannot potentially start having the full fold
link |
01:08:54.780
from the get go, right?
link |
01:08:57.100
So it's still, you know, it's still a big enigma,
link |
01:09:00.340
what happens.
link |
01:09:01.420
We know that it's an extremely efficient
link |
01:09:04.260
and stable process, right?
link |
01:09:05.660
So there's this long sequence
link |
01:09:07.660
and the fold happens really quickly.
link |
01:09:09.500
Exactly.
link |
01:09:10.340
So that's really weird, right?
link |
01:09:11.180
And it happens like the same way almost every time.
link |
01:09:15.340
Exactly, exactly.
link |
01:09:16.380
That's really weird.
link |
01:09:17.860
That's freaking weird.
link |
01:09:19.060
It's, yeah, that's why it's such an amazing thing.
link |
01:09:22.900
But most importantly, right?
link |
01:09:24.300
So it's, you know, so when you see the, you know,
link |
01:09:27.460
the translation process, right?
link |
01:09:29.260
So when you don't have the whole protein translated,
link |
01:09:36.100
right, it's still being translated,
link |
01:09:37.860
you know, getting out from the ribosome,
link |
01:09:41.180
you already see some structural, you know, fragmentation.
link |
01:09:45.780
So folding starts happening
link |
01:09:49.300
before the whole protein gets produced, right?
link |
01:09:52.780
And so this is obviously, you know,
link |
01:09:55.060
one of the biggest questions in, you know,
link |
01:09:59.220
in modern molecular biologists.
link |
01:10:00.980
Not like maybe what happens,
link |
01:10:04.180
like that's not as bigger than the question of folding.
link |
01:10:07.860
That's the question of like,
link |
01:10:09.540
something like deeper fundamental idea of folding.
link |
01:10:12.460
Yes. Behind folding.
link |
01:10:13.380
Exactly, exactly.
link |
01:10:14.620
So, you know, so obviously if we are able to predict
link |
01:10:21.340
the end product of protein folding,
link |
01:10:24.060
we are one step closer to understanding
link |
01:10:27.660
sort of the mechanisms of the protein folding.
link |
01:10:30.220
Because we can then potentially look and start probing
link |
01:10:34.700
what are the critical parts of this process
link |
01:10:38.260
and what are not so critical parts of this process.
link |
01:10:41.260
So we can start decomposing this, you know,
link |
01:10:44.420
so in a way this protein structure prediction algorithm
link |
01:10:50.100
can be used as a tool, right?
link |
01:10:53.700
So you change the, you know, you modify the protein,
link |
01:10:59.220
you get back to this tool, it predicts,
link |
01:11:02.380
okay, it's completely unstable.
link |
01:11:04.940
Yeah, which aspects of the input
link |
01:11:07.820
will have a big impact on the output?
link |
01:11:09.860
Exactly, exactly.
link |
01:11:11.140
So what happens is, you know,
link |
01:11:13.340
we typically have some sort of incremental advancement,
link |
01:11:18.700
you know, each stage of this CASP competition,
link |
01:11:22.580
you have groups with incremental advancement
link |
01:11:25.340
and, you know, historically the top performing groups
link |
01:11:29.860
were, you know, they were not using machine learning.
link |
01:11:34.420
They were using a very advanced biophysics
link |
01:11:37.700
combined with bioinformatics,
link |
01:11:39.620
combined with, you know, the data mining
link |
01:11:43.220
and that was, you know, that would enable them
link |
01:11:47.380
to obtain protein structures of those proteins
link |
01:11:52.660
that don't have any structurally solved relatives
link |
01:11:57.540
because, you know, if we have another protein,
link |
01:12:01.860
say the same protein, but coming from a different species,
link |
01:12:07.500
we could potentially derive some ideas
link |
01:12:10.460
and that's so called homology or comparative modeling,
link |
01:12:13.220
where we'll derive some ideas
link |
01:12:15.300
from the previously known structures
link |
01:12:17.540
and that would help us tremendously
link |
01:12:19.860
in, you know, in reconstructing the 3D structure overall.
link |
01:12:25.380
But what happens when we don't have these relatives?
link |
01:12:27.900
This is when it becomes really, really hard, right?
link |
01:12:31.220
So that's so called de novo, you know,
link |
01:12:35.260
de novo protein structure prediction.
link |
01:12:37.500
And in this case, those methods were traditionally very good.
link |
01:12:43.060
But what happened in the last year,
link |
01:12:46.300
the original alpha fold came into
link |
01:12:50.640
and all of a sudden it's much better than everyone else.
link |
01:12:56.420
This is 2018.
link |
01:12:57.900
Yeah.
link |
01:12:58.740
Oh, and the competition is only every two years, I think.
link |
01:13:02.140
And then, so, you know, it was sort of kind of over shockwave
link |
01:13:08.060
to the bioinformatics community that, you know,
link |
01:13:10.740
we have like a state of the art machine learning system
link |
01:13:15.440
that does, you know, structure prediction.
link |
01:13:18.460
And essentially what it does, you know,
link |
01:13:20.780
so if you look at this, it actually predicts the context.
link |
01:13:26.120
So, you know, so the process of reconstructing
link |
01:13:29.460
the 3D structure starts by predicting the context
link |
01:13:34.700
between the different parts of the protein.
link |
01:13:38.860
And the context essentially is the parts of the proteins
link |
01:13:40.980
that are in a close proximity to each other.
link |
01:13:43.240
Right, so actually the machine learning part
link |
01:13:45.820
seems to be estimating, you can correct me if I'm wrong here,
link |
01:13:51.080
but it seems to be estimating the distance matrix,
link |
01:13:53.180
which is like the distance between the different parts.
link |
01:13:55.900
Yeah, so we call the contact map.
link |
01:13:58.080
Contact map.
link |
01:13:58.920
So once you have the contact map,
link |
01:14:00.580
the reconstruction is becoming more straightforward, right?
link |
01:14:04.860
But so the contact map is the key.
link |
01:14:06.780
And so, you know, so that what happened.
link |
01:14:11.260
And now we started seeing in this current stage, right?
link |
01:14:15.980
Well, in the most recent one,
link |
01:14:18.500
we started seeing the emergence of these ideas
link |
01:14:22.020
in other people works, right?
link |
01:14:25.080
But yet here's, you know, AlphaFold2
link |
01:14:29.500
that again outperforms everyone else.
link |
01:14:33.380
And also by introducing yet another wave
link |
01:14:35.780
of the machine learning ideas.
link |
01:14:38.620
Yeah, there don't seem to be also an incorporation.
link |
01:14:41.260
First of all, the paper is not out yet,
link |
01:14:43.040
but there's a bunch of ideas already out.
link |
01:14:44.860
There does seem to be an incorporation of this other thing.
link |
01:14:48.100
I don't know if it's something that you could speak to,
link |
01:14:50.160
which is like the incorporation of like other structures,
link |
01:14:58.200
like evolutionary similar structures
link |
01:15:01.720
that are used to kind of give you hints.
link |
01:15:03.820
Yes, so evolutionary similarity is something
link |
01:15:08.360
that we can detect at different levels, right?
link |
01:15:10.740
So we know, for example,
link |
01:15:12.860
that the structure of proteins
link |
01:15:17.140
is more conserved than the sequence.
link |
01:15:20.520
The sequence could be very different,
link |
01:15:22.340
but the structural shape is actually still very conserved.
link |
01:15:26.300
So that's sort of the intrinsic property that, you know,
link |
01:15:28.880
in a way related to protein folds,
link |
01:15:31.140
you know, to the evolution of the, you know,
link |
01:15:34.140
of the proteins and protein domains, et cetera.
link |
01:15:37.820
But we know that, I mean, there've been multiple studies.
link |
01:15:41.060
And, you know, ideally, if you have structures,
link |
01:15:45.340
you know, you should use that information.
link |
01:15:48.580
However, sometimes we don't have this information.
link |
01:15:51.220
Instead, we have a bunch of sequences.
link |
01:15:53.220
Sequences, we have a lot, right?
link |
01:15:54.860
So we have, you know, hundreds, thousands
link |
01:16:00.340
of, you know, different organisms sequenced, right?
link |
01:16:04.300
And by taking the same protein,
link |
01:16:07.840
but in different organisms and aligning it,
link |
01:16:11.620
so making it, you know, making the corresponding positions
link |
01:16:15.980
aligned, we can actually say a lot
link |
01:16:20.500
about sort of what is conserved in this protein
link |
01:16:24.220
and therefore, you know, structurally more stable,
link |
01:16:26.920
what is diverse in this protein.
link |
01:16:28.920
So on top of that, we could provide sort of the information
link |
01:16:32.380
about the sort of the secondary structure
link |
01:16:35.100
of this protein, et cetera, et cetera.
link |
01:16:36.420
So this information is extremely useful
link |
01:16:39.940
and it's already there.
link |
01:16:41.300
So while it's tempting to, you know,
link |
01:16:44.140
to do a complete ab initio,
link |
01:16:46.060
so you just have a protein sequence and nothing else,
link |
01:16:49.540
the reality is such that we are overwhelmed with this data.
link |
01:16:54.220
So why not use it?
link |
01:16:56.500
And so, yeah, so I'm looking forward
link |
01:16:59.220
to reading this paper.
link |
01:17:01.500
It does seem to, like they've,
link |
01:17:03.420
in the previous version of Alpha Fold,
link |
01:17:05.100
they didn't, for this evolutionary similarity thing,
link |
01:17:09.780
they didn't use machine learning for that.
link |
01:17:12.960
Or rather, they used it as like the input
link |
01:17:15.600
to the entirety of the neural net,
link |
01:17:17.880
like the features derived from the similarity.
link |
01:17:22.020
It seems like there's some kind of quote, unquote,
link |
01:17:24.660
iterative thing where it seems to be part of the learning
link |
01:17:30.500
process is the incorporation of this evolutionary similarity.
link |
01:17:34.260
Yeah, I don't think there is a bioarchive paper, right?
link |
01:17:36.940
There's nothing.
link |
01:17:37.780
No, there's nothing.
link |
01:17:38.620
There's a blog post that's written
link |
01:17:40.680
by a marketing team, essentially,
link |
01:17:42.600
which, you know, it has some scientific similarity,
link |
01:17:48.420
probably, to the actual methodology used,
link |
01:17:51.780
but it could be, it's like interpreting scripture.
link |
01:17:55.220
It could be just poetic interpretations of the actual work
link |
01:17:59.660
as opposed to direct connection to the work.
link |
01:18:01.900
So now, speaking about protein folding, right?
link |
01:18:04.260
So, you know, in order to answer the question
link |
01:18:06.820
whether or not we have solved this, right?
link |
01:18:09.460
So we need to go back to the beginning of our conversation
link |
01:18:13.580
with the realization that an average protein
link |
01:18:16.100
is that typically what the CASP has been focusing on
link |
01:18:22.180
is this competition has been focusing
link |
01:18:25.820
on the single, maybe two domain proteins
link |
01:18:29.220
that are still very compact.
link |
01:18:31.060
And even those ones are extremely challenging to solve.
link |
01:18:35.400
But now we talk about, you know,
link |
01:18:37.660
an average protein that has two, three protein domains.
link |
01:18:42.420
If you look at the proteins that are in charge
link |
01:18:46.920
of the, you know, of the process with the neural system,
link |
01:18:51.120
right, perhaps one of the most recently evolved
link |
01:18:58.500
sort of systems in an organism, right?
link |
01:19:03.500
All of them, well, the majority of them
link |
01:19:06.360
are highly multi domain proteins.
link |
01:19:09.000
So they are, you know, some of them have five, six, seven,
link |
01:19:13.520
you know, and more domains, right?
link |
01:19:16.840
And, you know, we are very far away
link |
01:19:20.000
from understanding how these proteins are folded.
link |
01:19:22.400
So the complexity of the protein matters here.
link |
01:19:24.440
The complexity of the protein modules
link |
01:19:27.920
or the protein domains.
link |
01:19:30.220
So you're saying solved, so the definition
link |
01:19:35.220
of solved here is particularly the CASP competition
link |
01:19:38.620
achieving human level, not human level,
link |
01:19:41.760
achieving experimental level performance
link |
01:19:45.620
on these particular sets of proteins
link |
01:19:48.520
that have been used in these competitions.
link |
01:19:50.300
Well, I mean, you know, I do think that, you know,
link |
01:19:54.740
especially with regards to the alpha fold,
link |
01:19:57.500
you know, it is able to, you know, to solve,
link |
01:20:03.020
you know, at the near experimental level,
link |
01:20:08.980
pre big majority of the more compact proteins
link |
01:20:15.000
like, or protein domains.
link |
01:20:16.360
Because again, in order to understand
link |
01:20:18.740
how the overall protein, you know,
link |
01:20:22.800
multi domain protein fold, we do need to understand
link |
01:20:26.220
the structure of its individual domains.
link |
01:20:28.760
I mean, unlike if you look at alpha zero
link |
01:20:31.140
or like even mu zero, if you look at that work,
link |
01:20:36.500
you know, it's nice reinforcement learning
link |
01:20:39.540
self playing mechanisms are nice
link |
01:20:41.100
cause it's all in simulation.
link |
01:20:42.380
So you can learn from just huge amounts.
link |
01:20:45.920
Like you don't need data.
link |
01:20:47.340
It was like the problem with proteins,
link |
01:20:49.740
like the size, I forget how many 3D structures
link |
01:20:54.540
have been mapped, but the training data is very small.
link |
01:20:56.980
No matter what, it's like millions,
link |
01:20:59.060
maybe a one or two million or something like that,
link |
01:21:01.400
but it's some very small number,
link |
01:21:02.940
but like, it doesn't seem like that's scalable.
link |
01:21:06.820
There has to be, I don't know,
link |
01:21:09.380
it feels like you want to somehow 10 X the data
link |
01:21:13.100
or a hundred X the data somehow.
link |
01:21:15.700
Yes, but we also can take advantage of homology models,
link |
01:21:20.700
right, so the models that are of very good quality
link |
01:21:26.740
because they are essentially obtained
link |
01:21:30.660
based on the evolutionary information, right?
link |
01:21:33.720
So you can, there is a potential to enhance this information
link |
01:21:38.540
and, you know, use it again to empower the training set.
link |
01:21:43.540
And it's, I think, I am actually very optimistic.
link |
01:21:49.780
I think it's been one of this sort of, you know,
link |
01:21:58.100
churning events where you have a system that is,
link |
01:22:05.220
you know, a machine learning system
link |
01:22:07.300
that is truly better than the machine learning system.
link |
01:22:12.300
Better than the sort of the more conventional
link |
01:22:15.740
biophysics based methods.
link |
01:22:17.940
That's a huge leap.
link |
01:22:19.380
This is one of those fun questions,
link |
01:22:21.280
but where would you put it in the ranking
link |
01:22:26.720
of the greatest breakthroughs
link |
01:22:28.540
in artificial intelligence history?
link |
01:22:31.740
So like, okay, so let's see who's in the running.
link |
01:22:34.940
Maybe you can correct me.
link |
01:22:35.860
So you got like AlphaZero and AlphaGo
link |
01:22:39.900
beating the world champion at the game of Go.
link |
01:22:44.500
Thought to be impossible like 20 years ago.
link |
01:22:48.220
Or at least the AI community was highly skeptical.
link |
01:22:51.340
Then you got like also Deep Blue original Kasparov.
link |
01:22:55.060
You have deep learning itself,
link |
01:22:56.260
like the maybe, what would you say,
link |
01:22:58.280
the AlexNet, ImageNet moment.
link |
01:23:00.940
So the first neural network
link |
01:23:02.780
achieving human level performance.
link |
01:23:04.780
Super, that's not true.
link |
01:23:07.860
Achieving like a big leap in performance
link |
01:23:10.980
on the computer vision problem.
link |
01:23:14.420
There is OpenAI, the whole like GPT3,
link |
01:23:18.980
that whole space of transformers and language models
link |
01:23:23.020
just achieving this incredible performance
link |
01:23:27.120
of application of neural networks to language models.
link |
01:23:31.780
Boston Dynamics, pretty cool.
link |
01:23:33.540
Like robotics.
link |
01:23:35.220
People are like, there's no AI.
link |
01:23:38.200
No, no, there's no machine learning currently.
link |
01:23:41.520
But AI is much bigger than machine learning.
link |
01:23:44.520
So that just the engineering aspect,
link |
01:23:48.860
I would say it's one of the greatest accomplishments
link |
01:23:50.780
in engineering side.
link |
01:23:52.860
Engineering meaning like mechanical engineering
link |
01:23:56.140
of robotics ever.
link |
01:23:57.980
Then of course, autonomous vehicles.
link |
01:23:59.500
You can argue for Waymo,
link |
01:24:01.300
which is like the Google self driving car.
link |
01:24:03.580
Or you can argue for Tesla,
link |
01:24:05.460
which is like actually being used
link |
01:24:07.860
by hundreds of thousands of people on the road today,
link |
01:24:10.740
machine learning system.
link |
01:24:13.700
And I don't know if you can, what else is there?
link |
01:24:17.560
But I think that's it.
link |
01:24:18.700
And then AlphaFold, many people are saying
link |
01:24:20.900
is up there, potentially number one.
link |
01:24:23.300
Would you put them at number one?
link |
01:24:24.820
Well, in terms of the impact on the science
link |
01:24:29.820
and on the society beyond, it's definitely,
link |
01:24:34.060
to me would be one of the...
link |
01:24:37.460
Top three?
link |
01:24:39.060
What you want?
link |
01:24:39.900
Maybe, I mean, I'm probably not the best person
link |
01:24:43.020
to answer that.
link |
01:24:45.460
But I do have, I remember my,
link |
01:24:51.540
back in, I think 1997, when Deep Blue,
link |
01:24:56.380
that Kasparov, it was, I mean, it was a shock.
link |
01:25:01.860
I mean, it was, and I think for the,
link |
01:25:04.180
for the pre substantial part of the world,
link |
01:25:14.220
that especially people who have some experience with chess,
link |
01:25:21.740
and realizing how incredibly human this game,
link |
01:25:25.660
how much of a brain power you need
link |
01:25:30.220
to reach those levels of grandmasters, right, level.
link |
01:25:36.020
And it's probably one of the first time,
link |
01:25:37.920
and how good Kasparov was.
link |
01:25:39.780
And again, yeah, so Kasparov's arguably
link |
01:25:42.300
one of the best ever, right?
link |
01:25:45.580
And you get a machine that beats him.
link |
01:25:47.860
All right, so it's...
link |
01:25:48.820
First time a machine probably beat a human
link |
01:25:50.740
at that scale of a thing, of anything.
link |
01:25:53.720
Yes, yes.
link |
01:25:54.740
So that was, to me, that was like, you know,
link |
01:25:57.220
one of the groundbreaking events in the history of AI.
link |
01:26:00.620
Yeah, that's probably number one.
link |
01:26:02.340
Probably, like we don't, it's hard to remember.
link |
01:26:05.460
It's like Muhammad Ali versus, I don't know,
link |
01:26:08.100
any of the Mike Tyson, something like that.
link |
01:26:09.900
It's like, nah, you gotta put Muhammad Ali at number one.
link |
01:26:13.660
Same with Deep Blue,
link |
01:26:15.300
even though it's not machine learning based.
link |
01:26:19.340
Still, it uses advanced search,
link |
01:26:21.540
and search is the integral part of AI, right?
link |
01:26:24.420
It's not, people don't think of it that way at this moment.
link |
01:26:27.660
In vogue currently, search is not seen
link |
01:26:30.900
as a fundamental aspect of intelligence,
link |
01:26:34.220
but it very well, I mean, it very likely is.
link |
01:26:37.700
In fact, I mean, that's what neural networks are,
link |
01:26:39.660
is they're just performing search
link |
01:26:41.260
on the space of parameters, and it's all search.
link |
01:26:45.540
All of intelligence is some form of search,
link |
01:26:47.740
and you just have to become cleverer and clever
link |
01:26:49.660
at that search problem.
link |
01:26:50.900
And I also have another one that you didn't mention
link |
01:26:53.980
that's one of my favorite ones is,
link |
01:26:58.260
so you've probably heard of this,
link |
01:26:59.860
it's, I think it's called Deep Rembrandt.
link |
01:27:03.420
It's the project where they trained,
link |
01:27:06.820
I think there was a collaboration
link |
01:27:08.220
between the sort of the experts
link |
01:27:11.580
in Rembrandt painting in Netherlands,
link |
01:27:15.500
and a group, an artificial intelligence group,
link |
01:27:18.300
where they train an algorithm
link |
01:27:20.220
to replicate the style of the Rembrandt,
link |
01:27:22.980
and they actually printed a portrait
link |
01:27:26.980
that never existed before in the style of Rembrandt.
link |
01:27:32.620
I think they printed it on a sort of,
link |
01:27:36.740
on the canvas that, you know,
link |
01:27:38.500
using pretty much same types of paints and stuff.
link |
01:27:42.580
To me, it was mind blowing.
link |
01:27:44.060
Yeah, and the space of art, that's interesting.
link |
01:27:46.900
There hasn't been, maybe that's it,
link |
01:27:50.100
but I think there hasn't been an image in that moment yet
link |
01:27:54.580
in the space of art.
link |
01:27:56.780
You haven't been able to achieve
link |
01:27:58.620
superhuman level performance in the space of art,
link |
01:28:01.420
even though there's this big famous thing
link |
01:28:04.660
where a piece of art was purchased,
link |
01:28:07.660
I guess for a lot of money.
link |
01:28:08.700
Yes.
link |
01:28:09.540
Yeah, but it's still, you know,
link |
01:28:11.660
people are like in the space of music at least,
link |
01:28:15.620
that's, you know, it's clear that human created pieces
link |
01:28:19.740
are much more popular.
link |
01:28:21.700
So there hasn't been a moment where it's like,
link |
01:28:24.420
oh, this is, we're now,
link |
01:28:26.700
I would say in the space of music,
link |
01:28:28.780
what makes a lot of money,
link |
01:28:30.140
we're talking about serious money,
link |
01:28:32.100
it's music and movies, or like shows and so on,
link |
01:28:35.300
and entertainment.
link |
01:28:36.640
There hasn't been a moment where AI created,
link |
01:28:41.280
AI was able to create a piece of music
link |
01:28:44.460
or a piece of cinema, like Netflix show,
link |
01:28:49.820
that is, you know, that's sufficiently popular
link |
01:28:53.540
to make a ton of money.
link |
01:28:55.260
Yeah.
link |
01:28:56.100
And that moment would be very, very powerful,
link |
01:28:58.940
because that's like, that's an AI system
link |
01:29:01.560
being used to make a lot of money.
link |
01:29:03.060
And like direct, of course, AI tools,
link |
01:29:05.480
like even Premiere, audio editing,
link |
01:29:07.140
all the editing, everything I do,
link |
01:29:08.780
to edit this podcast, there's a lot of AI involved.
link |
01:29:11.660
Actually, this is a program,
link |
01:29:13.260
I wanna talk to those folks, just cause I wanna nerd out,
link |
01:29:15.540
it's called iZotope, I don't know if you're familiar with it.
link |
01:29:18.060
They have a bunch of tools of audio processing,
link |
01:29:20.140
and they have, I think they're Boston based,
link |
01:29:23.080
just, it's so exciting to me to use it,
link |
01:29:26.380
like on the audio here,
link |
01:29:28.200
cause it's all machine learning.
link |
01:29:30.380
It's not, cause most audio production stuff
link |
01:29:35.780
is like any kind of processing you do,
link |
01:29:37.540
it's very basic signal processing,
link |
01:29:39.500
and you're tuning knobs and so on.
link |
01:29:41.980
They have all of that, of course,
link |
01:29:43.580
but they also have all of this machine learning stuff,
link |
01:29:46.020
like where you actually give it training data,
link |
01:29:48.520
you select parts of the audio you train on,
link |
01:29:51.740
you train on it, and it figures stuff out.
link |
01:29:56.380
It's great, it's able to detect,
link |
01:29:59.020
like the ability of it to be able
link |
01:30:01.380
to separate voice and music, for example,
link |
01:30:04.820
or voice and anything, is incredible.
link |
01:30:07.260
Like it just, it's clearly exceptionally good
link |
01:30:11.140
at applying these different neural networks models
link |
01:30:14.940
to just separate the different kinds
link |
01:30:17.740
of signals from the audio.
link |
01:30:19.180
That, okay, so that's really exciting.
link |
01:30:22.260
Photoshop, Adobe people also use it,
link |
01:30:24.580
but to generate a piece of music
link |
01:30:28.260
that will sell millions, a piece of art, yeah.
link |
01:30:31.980
No, I agree, and you know, it's,
link |
01:30:34.420
that's, you know, as I mentioned,
link |
01:30:39.220
I offer my AI class, and you know,
link |
01:30:41.700
an integral part of this is the project, right?
link |
01:30:44.660
So it's my favorite, ultimate favorite part,
link |
01:30:47.340
because it typically, we have these project presentations
link |
01:30:51.380
the last two weeks of the classes,
link |
01:30:53.720
right before, you know, the Christmas break,
link |
01:30:56.220
and it's sort of, it adds this cool excitement,
link |
01:31:00.300
and every time, I mean, I'm amazed, you know,
link |
01:31:02.660
with some projects that people, you know, come up with.
link |
01:31:07.660
And so, and quite a few of them are actually, you know,
link |
01:31:12.060
they have some link to arts.
link |
01:31:17.060
I mean, you know, I think last year we had a group
link |
01:31:21.260
who designed an AI producing hokus, Japanese poems.
link |
01:31:27.660
Oh, wow.
link |
01:31:29.380
So, and some of them, so, you know,
link |
01:31:31.820
it got trained on the English based,
link |
01:31:34.780
haikus, haikus, right?
link |
01:31:36.460
So, and some of them, you know,
link |
01:31:40.260
they get to present, like, the top selection.
link |
01:31:43.460
They were pretty good.
link |
01:31:44.300
I mean, you know, I mean, of course, I'm not a specialist,
link |
01:31:47.020
but you read them, and you see this is real.
link |
01:31:49.700
It seems profound.
link |
01:31:50.660
Yes, yeah, it seems real.
link |
01:31:52.780
So it's kind of cool.
link |
01:31:55.060
We also had a couple of projects where people tried
link |
01:31:57.940
to teach AI how to play, like, rock music, classical music.
link |
01:32:02.940
I think, and popular music.
link |
01:32:05.940
Yeah.
link |
01:32:07.820
Interestingly enough, you know,
link |
01:32:10.620
classical music was among the most difficult ones.
link |
01:32:14.580
Oh, sure.
link |
01:32:15.420
And, you know, of course, if you, if, you know,
link |
01:32:21.780
you know, if you look at the, you know,
link |
01:32:23.780
the, like, grandmasters of music, like Bach, right?
link |
01:32:28.780
So there is a lot of, there is a lot of,
link |
01:32:31.940
there is a lot of almost math.
link |
01:32:34.820
Yeah, well, he's very mathematical.
link |
01:32:36.580
Yeah, exactly.
link |
01:32:37.420
So this is, I would imagine that at least some style
link |
01:32:41.500
of this music could be picked up,
link |
01:32:43.820
but then you have this completely different spectrum
link |
01:32:46.980
of classical composers.
link |
01:32:49.260
And so, you know, it's almost like, you know,
link |
01:32:54.140
you don't have to sort of look at the data.
link |
01:32:56.780
You just listen to it and say, nah, that's not it, not yet.
link |
01:33:01.140
That's not it, yeah.
link |
01:33:02.380
That's how I feel too.
link |
01:33:03.340
There's OpenAI has, I think, OpenMuse
link |
01:33:05.820
or something like that, the system.
link |
01:33:07.540
It's cool, but it's like, eh,
link |
01:33:09.740
it's not compelling for some reason.
link |
01:33:12.060
It could be a psychological reason too.
link |
01:33:14.180
Maybe we need to have a human being,
link |
01:33:17.580
a tortured soul behind the music.
link |
01:33:19.660
I don't know.
link |
01:33:20.700
Yeah, no, absolutely.
link |
01:33:22.220
I completely agree.
link |
01:33:23.940
But yeah, whether or not we'll have,
link |
01:33:26.580
one day we'll have, you know,
link |
01:33:29.140
a song written by an AI engine
link |
01:33:33.340
to be like in top charts, musical charts,
link |
01:33:37.980
I wouldn't be surprised.
link |
01:33:40.540
I wouldn't be surprised.
link |
01:33:43.380
I wonder if we already have one
link |
01:33:44.700
and it just hasn't been announced.
link |
01:33:48.020
We wouldn't know.
link |
01:33:49.980
How hard is the multi protein folding problem?
link |
01:33:53.940
Is that kind of something you've already mentioned
link |
01:33:57.100
which is baked into this idea of greater
link |
01:33:59.180
and greater complexity of proteins?
link |
01:34:01.180
Like multi domain proteins,
link |
01:34:03.300
is that basically become multi protein complexes?
link |
01:34:08.940
Yes, you got it right.
link |
01:34:10.620
So it's sort of, it has the components
link |
01:34:15.980
of both of protein folding
link |
01:34:18.460
and protein, protein interactions.
link |
01:34:21.900
Because in order for these domains,
link |
01:34:24.460
many of these proteins actually,
link |
01:34:27.260
they never form a stable structure.
link |
01:34:31.140
One of my favorite proteins,
link |
01:34:33.020
and pretty much everyone who works in the,
link |
01:34:37.700
I know, whom I know, who works with proteins,
link |
01:34:41.740
they always have their favorite proteins.
link |
01:34:44.660
Right, so one of my favorite proteins,
link |
01:34:47.660
probably my favorite protein,
link |
01:34:49.140
the one that I worked when I was a postdoc
link |
01:34:51.420
is so called post synaptic density 95, PSD 95 protein.
link |
01:34:56.180
So it's one of the key actors
link |
01:35:00.500
in the majority of neurological processes
link |
01:35:03.780
at the molecular level.
link |
01:35:04.700
So it's a, and essentially it's a key player
link |
01:35:11.060
in the post synaptic density.
link |
01:35:13.460
So this is the crucial part of this synapse
link |
01:35:17.180
where a lot of these chemical processes are happening.
link |
01:35:22.420
So it has five domains, right?
link |
01:35:26.220
So five protein domains.
link |
01:35:27.460
So pretty large proteins, I think 600 something assets.
link |
01:35:35.700
But the way it's organized itself, it's flexible, right?
link |
01:35:41.260
So it acts as a scaffold.
link |
01:35:43.820
So it is used to bring in other proteins.
link |
01:35:49.260
So they start acting in the orchestrated manner, right?
link |
01:35:54.260
So, and the type of the shape of this protein,
link |
01:35:58.780
it's in a way, there are some stable parts of this protein,
link |
01:36:02.500
but there are some flexible.
link |
01:36:04.420
And this flexibility is built in into the protein
link |
01:36:08.580
in order to become sort of this multifunctional machine.
link |
01:36:13.100
So do you think that kind of thing is also learnable
link |
01:36:16.460
through the alpha fold two kind of approach?
link |
01:36:19.340
I mean, the time will tell.
link |
01:36:22.380
Is it another level of complexity?
link |
01:36:24.460
Is it like how big of a jump in complexity
link |
01:36:27.300
is that whole thing?
link |
01:36:28.140
To me, it's yet another level of complexity
link |
01:36:31.340
because when we talk about protein, protein interactions,
link |
01:36:35.140
and there is actually a different challenge for this
link |
01:36:38.820
called Capri, and so this, that is focused specifically
link |
01:36:43.420
on macromolecular interactions, protein, protein, protein,
link |
01:36:47.060
DNA, et cetera.
link |
01:36:48.540
So, but it's, there are different mechanisms
link |
01:36:56.020
that govern molecular interactions
link |
01:36:58.740
and that need to be picked up,
link |
01:37:00.740
say by a machine learning algorithm.
link |
01:37:03.660
Interestingly enough, we actually,
link |
01:37:06.500
we participated for a few years in this competition.
link |
01:37:11.740
We typically don't participate in competitions,
link |
01:37:14.900
I don't know, don't have enough time,
link |
01:37:19.820
because it's very intensive, it's a very intensive process.
link |
01:37:23.700
But we participated back in about 10 years ago or so.
link |
01:37:30.580
And the way we entered this competition,
link |
01:37:32.660
so we design a scoring function, right?
link |
01:37:35.420
So the function that evaluates
link |
01:37:37.580
whether or not your protein, protein interaction
link |
01:37:40.540
is supposed to look like experimentally solved, right?
link |
01:37:43.380
So the scoring function is very critical part
link |
01:37:45.900
of the model prediction.
link |
01:37:49.820
So we designed it to be a machine learning one.
link |
01:37:52.740
And so it was one of the first machine learning
link |
01:37:56.620
based scoring function used in Capri.
link |
01:38:00.020
And we essentially learned what should contribute,
link |
01:38:06.580
what are the critical components contributing
link |
01:38:08.860
into the protein, protein interaction.
link |
01:38:10.540
So this could be converted into a learning problem
link |
01:38:13.340
and thereby it could be learned?
link |
01:38:15.620
I believe so, yes.
link |
01:38:17.020
Do you think AlphaFold2 or something similar to it
link |
01:38:20.460
from DeepMind or somebody else will be,
link |
01:38:24.300
will result in a Nobel Prize or multiple Nobel Prizes?
link |
01:38:28.660
So like, you know, obviously, maybe not so obviously,
link |
01:38:33.300
you can't give a Nobel Prize to a computer program.
link |
01:38:38.660
At least for now, give it to the designers of that program.
link |
01:38:42.140
But do you see one or multiple Nobel Prizes
link |
01:38:46.060
where AlphaFold2 is like a large percentage
link |
01:38:51.700
of what that prize is given for?
link |
01:38:54.860
Would it lead to discoveries at the level of Nobel Prizes?
link |
01:39:00.540
I mean, I think we are definitely destined
link |
01:39:05.420
to see the Nobel Prize becoming sort of,
link |
01:39:08.740
to be evolving with the evolution of science
link |
01:39:12.340
and the evolution of science as such
link |
01:39:14.540
that it now becomes like really multi facets, right?
link |
01:39:17.860
So where you don't really have like a unique discipline,
link |
01:39:21.340
you have sort of the, a lot of cross disciplinary talks
link |
01:39:25.660
in order to achieve sort of, you know,
link |
01:39:28.740
really big advancements, you know.
link |
01:39:32.380
So I think, you know, the computational methods
link |
01:39:39.180
will be acknowledged in one way or another.
link |
01:39:42.500
And as a matter of fact, you know,
link |
01:39:46.860
they were first acknowledged back in 2013, right?
link |
01:39:50.580
Where, you know, the first three people were, you know,
link |
01:39:56.500
awarded the Nobel Prize for study the protein folding,
link |
01:40:00.540
right, the principle.
link |
01:40:01.500
And, you know, I think all three of them
link |
01:40:03.820
are computational biophysicists, right?
link |
01:40:06.940
So, you know, that I think is unavoidable.
link |
01:40:13.260
You know, it will come with the time.
link |
01:40:16.580
The fact that, you know, alpha fold and, you know,
link |
01:40:23.540
similar approaches, because again, it's a matter of time
link |
01:40:26.340
that people will embrace this, you know, principle
link |
01:40:31.700
and we'll see more and more such, you know,
link |
01:40:34.940
such tools coming into play.
link |
01:40:36.940
But, you know, these methods will be critical
link |
01:40:41.940
in a scientific discovery, no doubts about it.
link |
01:40:47.700
On the engineering side, maybe a dark question,
link |
01:40:51.500
but do you think it's possible to use
link |
01:40:53.860
these machine learning methods
link |
01:40:55.140
to start to engineer proteins?
link |
01:40:59.020
And the next question is something quite a few biologists
link |
01:41:04.660
are against, some are for, for study purposes,
link |
01:41:07.300
is to engineer viruses.
link |
01:41:09.620
Do you think machine learning, like something like alpha fold
link |
01:41:12.620
could be used to engineer viruses?
link |
01:41:14.780
So to answer the first question, you know,
link |
01:41:16.980
it has been, you know, a part of the research
link |
01:41:21.660
in the protein science, the protein design is, you know,
link |
01:41:25.500
is a very prominent areas of research.
link |
01:41:29.180
Of course, you know, one of the pioneers is David Baker
link |
01:41:32.020
and Rosetta algorithm that, you know,
link |
01:41:34.900
essentially was doing the de novo design and was used
link |
01:41:39.740
to design new proteins, you know.
link |
01:41:41.580
And design of proteins means design of function.
link |
01:41:44.220
So like when you design a protein, you can control,
link |
01:41:47.300
I mean, the whole point of a protein
link |
01:41:49.100
with the protein structure comes a function,
link |
01:41:52.180
like it's doing something.
link |
01:41:53.700
Correct.
link |
01:41:54.540
So you can design different things.
link |
01:41:56.060
So you can, yeah, so you can, well,
link |
01:41:58.140
you can look at the proteins from the functional perspective.
link |
01:42:00.700
You can also look at the proteins
link |
01:42:02.700
from the structural perspective, right?
link |
01:42:04.180
So the structural building blocks.
link |
01:42:05.700
So if you want to have a building block
link |
01:42:07.660
of a certain shape, you can try to achieve it
link |
01:42:10.540
by, you know, introducing a new protein sequence
link |
01:42:13.140
and predicting, you know, how it will fold.
link |
01:42:17.260
So with that, I mean, it's a natural,
link |
01:42:22.060
one of the, you know, natural applications
link |
01:42:25.820
of these algorithms.
link |
01:42:28.220
Now, talking about engineering a virus.
link |
01:42:34.140
With machine learning.
link |
01:42:35.140
With machine learning, right?
link |
01:42:36.380
So, well, you know, so luckily for us,
link |
01:42:41.740
I mean, we don't have that much data, right?
link |
01:42:46.780
Yeah.
link |
01:42:47.580
We actually, right now, one of the projects
link |
01:42:50.100
that we are carrying on in the lab
link |
01:42:53.700
is we're trying to develop a machine learning algorithm
link |
01:42:56.940
that determines the,
link |
01:42:59.300
whether or not the current strain is pathogenic.
link |
01:43:02.700
And the current strain of the coronavirus.
link |
01:43:04.620
Of the virus.
link |
01:43:06.100
I mean, so there are applications to coronaviruses
link |
01:43:08.980
because we have strains of SARS COVID 2,
link |
01:43:11.460
also SARS COVID, MERS that are pathogenic,
link |
01:43:14.580
but we also have strains of other coronaviruses
link |
01:43:17.620
that are, you know, not pathogenic.
link |
01:43:20.140
I mean, the common cold viruses and, you know,
link |
01:43:24.060
some other ones, right?
link |
01:43:25.580
So, so pathogenic meaning spreading.
link |
01:43:28.980
Pathogenic means actually inflicting damage.
link |
01:43:33.780
Correct.
link |
01:43:35.340
There are also some, you know,
link |
01:43:37.020
seasonal versus pandemic strains of influenza, right?
link |
01:43:41.780
And determining the, what are the molecular determinant,
link |
01:43:45.220
right?
link |
01:43:46.060
So that are built in, into the protein sequence,
link |
01:43:48.300
into the gene sequence, right?
link |
01:43:50.700
So, and whether or not the machine learning
link |
01:43:52.980
can determine those, those components, right?
link |
01:43:58.420
Oh, interesting.
link |
01:43:59.260
So like using machine learning to do,
link |
01:44:00.660
that's really interesting to, to, to given,
link |
01:44:03.940
give the input is like what the entire,
link |
01:44:07.380
the protein sequence and then determine
link |
01:44:09.740
if this thing is going to be able to do damage
link |
01:44:12.340
to a biological system.
link |
01:44:14.620
Yeah.
link |
01:44:15.900
So, so I mean,
link |
01:44:16.740
It's a good machine learning,
link |
01:44:17.580
you're saying we don't have enough data for that?
link |
01:44:19.380
We, I mean, for, for this specific one, we do.
link |
01:44:22.620
We might actually, I have, you know,
link |
01:44:24.460
have to back up on this because we're still in the process.
link |
01:44:27.260
There was one work that appeared in bioarchive
link |
01:44:31.660
by Eugene Kunin, who is one of these, you know,
link |
01:44:34.900
pioneers in, in, in evolutionary genomics.
link |
01:44:39.020
And they tried to look at this, but, you know,
link |
01:44:42.820
the methods were sort of standard, you know,
link |
01:44:46.060
supervised learning methods.
link |
01:44:48.620
And now the question is, you know,
link |
01:44:51.340
can you advance it further by, by using, you know,
link |
01:44:56.660
not so standard methods, you know?
link |
01:44:58.620
So there's obviously a lot of hope in,
link |
01:45:01.140
in transfer learning where you can actually try to transfer
link |
01:45:05.580
the information that the machine learning learns about
link |
01:45:08.060
the proper protein sequences, right?
link |
01:45:11.340
And, you know, so, so there is some promise
link |
01:45:16.140
in going this direction, but if we have this,
link |
01:45:18.740
it would be extremely useful because then
link |
01:45:21.060
we could essentially forecast the potential mutations
link |
01:45:24.100
that would make the current strain
link |
01:45:26.300
more or less pathogenic.
link |
01:45:27.900
Anticipate, anticipate them from a vaccine development,
link |
01:45:31.140
for the treatment, antiviral drug development.
link |
01:45:34.580
That, that would be a very crucial task.
link |
01:45:36.860
But you could also use that system to then say,
link |
01:45:42.180
how would we potentially modify this virus
link |
01:45:45.260
to make it more pathogenic?
link |
01:45:47.940
This, that's true.
link |
01:45:49.660
That's true.
link |
01:45:50.500
And then, you know, the, again,
link |
01:45:55.980
the hope is, well, several things, right?
link |
01:45:59.700
So one is that, you know, it's,
link |
01:46:02.140
even if you design a, you know, a sequence, right?
link |
01:46:06.780
So to carry out the actual experimental biology,
link |
01:46:12.540
to ensure that all the components working, you know,
link |
01:46:16.820
is a completely different matter.
link |
01:46:19.060
Difficult process.
link |
01:46:19.900
Yes.
link |
01:46:20.860
Then the, you know, we've seen in the past,
link |
01:46:24.420
there could be some regulation of the moment
link |
01:46:27.660
the scientific community recognizes
link |
01:46:30.420
that it's now becoming no longer a sort of a fun puzzle
link |
01:46:34.620
to, you know, for machine learning.
link |
01:46:36.660
Could be open.
link |
01:46:37.860
Yeah, so then there might be some regulation.
link |
01:46:40.420
So I think back in, what, 2015, there was, you know,
link |
01:46:45.780
there was an issue on regulating the research
link |
01:46:49.500
on influenza strains, right?
link |
01:46:52.700
There were several groups, you know,
link |
01:46:55.580
used sort of the mutation analysis
link |
01:46:58.060
to determine whether or not this strain will jump
link |
01:47:01.820
from one species to another.
link |
01:47:03.300
And I think there was like a half a year moratorium
link |
01:47:06.540
on the research on the paper published
link |
01:47:09.780
until, you know, scientists, you know, analyzed it
link |
01:47:13.580
and decided that it's actually safe.
link |
01:47:16.440
I forgot what that's called.
link |
01:47:17.620
Something of function, test of function.
link |
01:47:20.020
Gain of function.
link |
01:47:20.860
Gain of function, yeah.
link |
01:47:22.380
Gain of function, loss of function, that's right.
link |
01:47:24.020
Sorry.
link |
01:47:26.420
It's like, let's watch this thing mutate for a while
link |
01:47:29.620
to see like, to see what kind of things we can observe.
link |
01:47:33.780
I guess I'm not so much worried
link |
01:47:36.320
about that kind of research if there's a lot of regulation
link |
01:47:38.620
and if it's done very well and with competence and seriously.
link |
01:47:42.780
I am more worried about kind of this, you know,
link |
01:47:46.980
the underlying aspect of this question
link |
01:47:49.580
is more like 50 years from now.
link |
01:47:52.920
Speaking to the Drake equation,
link |
01:47:54.940
one of the parameters in the Drake equation
link |
01:47:57.300
is how long civilizations last.
link |
01:47:59.820
And that seems to be the most important value actually
link |
01:48:03.860
for calculating if there's other alien
link |
01:48:06.100
intelligent civilizations out there.
link |
01:48:08.040
That's where there's most variability.
link |
01:48:10.960
Assuming like if life, if that percentage
link |
01:48:15.060
that life can emerge is like not zero,
link |
01:48:19.380
like if we're a super unique,
link |
01:48:21.260
then it's the how long we last
link |
01:48:23.940
is basically the most important thing.
link |
01:48:26.180
So from a selfish perspective,
link |
01:48:29.020
but also from a Drake equation perspective,
link |
01:48:32.020
I'm worried about our civilization lasting.
link |
01:48:35.020
And you kind of think about all the ways
link |
01:48:37.620
in which machine learning can be used
link |
01:48:39.140
to design greater weapons of destruction, right?
link |
01:48:45.700
And I mean, one way to ask that
link |
01:48:48.620
if you look sort of 50 years from now,
link |
01:48:50.580
a hundred years from now,
link |
01:48:52.620
would you be more worried about natural pandemics
link |
01:48:55.780
or engineered pandemics?
link |
01:48:59.440
Like who's the better designer of viruses,
link |
01:49:02.640
nature or humans if we look down the line?
link |
01:49:05.980
I think in my view, I would still be worried
link |
01:49:10.140
about the natural pandemics simply because I mean,
link |
01:49:14.300
the capacity of the nature producing this.
link |
01:49:20.740
It does pretty good job, right?
link |
01:49:22.700
Yes.
link |
01:49:23.540
And the motivation for using virus,
link |
01:49:25.280
engineering viruses as a weapon is a weird one
link |
01:49:29.020
because maybe you can correct me on this,
link |
01:49:31.480
but it seems very difficult to target a virus, right?
link |
01:49:35.620
The whole point of a weapon, the way a rocket works,
link |
01:49:38.400
if a starting point, you have an end point
link |
01:49:40.100
and you're trying to hit a target,
link |
01:49:42.340
to hit a target with a virus is very difficult.
link |
01:49:44.660
It's basically just, right?
link |
01:49:47.980
The target would be the human species.
link |
01:49:51.940
Oh man.
link |
01:49:52.860
Yeah, I have a hope in us.
link |
01:49:54.780
I'm forever optimistic that we will not,
link |
01:49:58.260
there's insufficient evil in the world
link |
01:50:01.620
to lead to that kind of destruction.
link |
01:50:04.560
Well, I also hope that, I mean, that's what we see.
link |
01:50:07.780
I mean, with the way we are getting connected,
link |
01:50:11.780
the world is getting connected.
link |
01:50:14.460
I think it helps for the world to become more transparent.
link |
01:50:21.660
Yeah.
link |
01:50:22.560
So the information spread is,
link |
01:50:27.100
I think it's one of the key things for the society
link |
01:50:31.660
to become more balanced one way or another.
link |
01:50:36.460
This is something that people disagree with me on,
link |
01:50:38.340
but I do think that the kind of secrecy
link |
01:50:41.900
that governments have.
link |
01:50:43.460
So you're kind of speaking more to the other aspects,
link |
01:50:47.060
like a research community being more open,
link |
01:50:49.700
companies are being more open.
link |
01:50:52.160
Government is still like,
link |
01:50:55.900
we're talking about like military secrets.
link |
01:50:57.860
I think military secrets of the kind
link |
01:51:01.380
that could destroy the world
link |
01:51:03.700
will become also a thing of the 20th century.
link |
01:51:07.300
It'll become more and more open.
link |
01:51:09.320
Yeah.
link |
01:51:10.160
I think nations will lose power in the 21st century,
link |
01:51:13.220
like lose sufficient power towards secrecies.
link |
01:51:15.960
Transparency is more beneficial than secrecy,
link |
01:51:18.860
but of course it's not obvious.
link |
01:51:21.140
Let's hope so.
link |
01:51:22.180
Let's hope so that the governments
link |
01:51:27.180
will become more transparent.
link |
01:51:31.300
What, so we last talked, I think in March or April,
link |
01:51:35.260
what have you learned?
link |
01:51:36.740
How has your philosophical, psychological,
link |
01:51:40.460
biological worldview changed since then?
link |
01:51:43.820
Or you've been studying it nonstop
link |
01:51:46.100
from a computational biology perspective.
link |
01:51:48.900
How has your understanding and thoughts about this virus
link |
01:51:51.140
changed over those months from the beginning to today?
link |
01:51:54.460
One thing that I was really amazed at
link |
01:51:58.660
how efficient the scientific community was.
link |
01:52:03.140
I mean, and even just judging on this very narrow domain
link |
01:52:10.100
of protein structure and understanding
link |
01:52:13.060
the structural characterization of this virus
link |
01:52:17.600
from the components point of view,
link |
01:52:19.860
whole virus point of view.
link |
01:52:21.460
If you look at SARS, something that happened less than 20,
link |
01:52:31.020
but close enough, 20 years ago,
link |
01:52:34.980
and you see what, when it happened,
link |
01:52:38.500
what was sort of the response by the scientific community,
link |
01:52:42.460
you see that the structure characterizations did a cure,
link |
01:52:47.100
but it took several years, right?
link |
01:52:51.660
Now the things that took several years,
link |
01:52:54.940
it's a matter of months, right?
link |
01:52:56.900
So we see that the research pop up.
link |
01:53:01.620
We are at the unprecedented level
link |
01:53:03.940
in terms of the sequencing, right?
link |
01:53:05.980
Never before we had a single virus sequence so many times,
link |
01:53:10.980
so which allows us to actually to trace very precisely
link |
01:53:16.380
the sort of the evolutionary nature of this virus,
link |
01:53:21.380
what happens, and it's not just this virus independently
link |
01:53:27.420
of everything, it's the sequence of this virus
link |
01:53:32.420
linked, anchored to the specific geographic place
link |
01:53:36.420
to specific
link |
01:53:24.540
people, because our genotype influences also
link |
01:53:31.540
the evolution of this, it's always a host pathogen,
link |
01:53:35.540
core evolution that, you know,
link |
01:53:38.540
it's not just the virus, it's the sequence of this virus,
link |
01:53:41.540
it's the sequence of this virus linked to the specific
link |
01:53:44.540
geographic place, it's the sequence of this virus
link |
01:53:48.540
linked to the specific geographic place to specific people,
link |
01:53:52.540
that, you know, occurs.
link |
01:53:55.540
It'd be cool if we also had a lot more data about,
link |
01:53:58.540
so that the spread of this virus, not maybe,
link |
01:54:02.540
well, it'd be nice if we had it for like contact tracing
link |
01:54:06.540
purposes for this virus, but it'd be also nice if we had it
link |
01:54:09.540
for the study for future viruses to be able to respond
link |
01:54:12.540
and so on, but it's already nice that we have geographical
link |
01:54:15.540
data and the basic data from individual humans, yeah.
link |
01:54:18.540
Exactly, no, I think contact tracing is obviously
link |
01:54:22.540
a key component in understanding
link |
01:54:26.540
the spread of this virus.
link |
01:54:29.540
There is also, there is a number of challenges, right?
link |
01:54:31.540
So XPRIZE is one of them, we
link |
01:54:35.540
just recently took a part of
link |
01:54:39.540
this competition, it's the prediction of the
link |
01:54:43.540
number of infections in different regions.
link |
01:54:47.540
Oh, sure.
link |
01:54:48.540
So, you know, obviously the AI
link |
01:54:52.540
is the main topic in those predictions.
link |
01:54:55.540
Yeah, but it's still, the data, I mean, that's a competition,
link |
01:54:59.540
but the data is weak
link |
01:55:03.540
on the training. Like, it's great,
link |
01:55:07.540
it's much more than probably before, but like, it'd be nice if it was like
link |
01:55:11.540
really rich. I talked to Michael Mina from
link |
01:55:15.540
Harvard, I mean, he dreams that the community comes together with like a
link |
01:55:19.540
weather map to where viruses, right, like
link |
01:55:23.540
really high resolution sensors on like how
link |
01:55:27.540
from person to person the viruses that travel, all the different kinds of viruses, right?
link |
01:55:31.540
Because there's a ton of them, and then you'd be able to tell
link |
01:55:35.540
the story that you've spoken about
link |
01:55:39.540
of the evolution of these viruses, like day to day mutations that
link |
01:55:43.540
are occurring. I mean, that'd be fascinating just from a perspective of
link |
01:55:47.540
study and from the perspective of being able to respond to future pandemics.
link |
01:55:51.540
That's ultimately what I'm worried about. People love
link |
01:55:55.540
books. Is there some three
link |
01:55:59.540
or whatever number of books, technical, fiction, philosophical, that
link |
01:56:03.540
brought you joy in life, had an impact on your life,
link |
01:56:07.540
and maybe some that you would recommend others?
link |
01:56:11.540
I'll give you three very different books, and I also have a special runner up.
link |
01:56:15.540
Honorable mention.
link |
01:56:19.540
I mean, it's an audiobook, and that's
link |
01:56:23.540
some specific reason behind it. So the first book is
link |
01:56:27.540
something that sort of impacted my earlier
link |
01:56:31.540
stage of life, and I'm probably not going to be very original here.
link |
01:56:35.540
It's Bulgakov's Master and Margarita.
link |
01:56:39.540
For a Russian, maybe it's not super original,
link |
01:56:43.540
but it's a really powerful book, even in English.
link |
01:56:47.540
It is incredibly powerful, and
link |
01:56:51.540
I mean, the way it ends.
link |
01:56:55.540
I still have goosebumps when I read
link |
01:56:59.540
the very last sort of, it's called prologue, where
link |
01:57:03.540
it's just so powerful. What impact did it have on you? What ideas?
link |
01:57:07.540
What insights did you get from it? I was just taken by
link |
01:57:11.540
the fact that
link |
01:57:15.540
you have those parallel lives
link |
01:57:19.540
apart from many centuries, and
link |
01:57:23.540
somehow they got sort of intertwined into
link |
01:57:27.540
one story, and that
link |
01:57:31.540
to me was fascinating. And of course
link |
01:57:35.540
the romantic part of this book is like
link |
01:57:39.540
it's not just romance, it's like the romance
link |
01:57:43.540
empowered by sort of magic, right?
link |
01:57:47.540
And maybe on top of that, you have some irony,
link |
01:57:51.540
which is unavoidable, right? Because it was that
link |
01:57:55.540
Soviet time. But it's very deeply Russian, so that's
link |
01:57:59.540
the wit, the humor, the pain, the love,
link |
01:58:03.540
all of that is one of the books that kind of captures
link |
01:58:07.540
something about Russian culture that people outside of Russia
link |
01:58:11.540
should probably read. I agree. What's the second one? So the second one
link |
01:58:15.540
is again another one that it happened
link |
01:58:19.540
I read it later in my life. I think I read it
link |
01:58:23.540
first time when I was a graduate student.
link |
01:58:27.540
And that's the Solzhenitsyn's Cancer Word.
link |
01:58:31.540
That is amazingly powerful book.
link |
01:58:35.540
What is it about? It's about, I mean, essentially
link |
01:58:39.540
based on Solzhenitsyn was
link |
01:58:43.540
diagnosed with cancer when he was reasonably young, and he
link |
01:58:47.540
made a full recovery. So this is
link |
01:58:51.540
about a person who was sentenced
link |
01:58:55.540
for life in one of these camps.
link |
01:58:59.540
And he had some cancer,
link |
01:59:03.540
so he was transported back to one of these
link |
01:59:07.540
Soviet republics, I think it was
link |
01:59:11.540
South Asian republics. And the
link |
01:59:15.540
book is about
link |
01:59:19.540
his experience being a
link |
01:59:23.540
prisoner, being a patient in the
link |
01:59:27.540
cancer clinic, in the cancer ward, surrounded
link |
01:59:31.540
by people, many of which die.
link |
01:59:35.540
But in the way
link |
01:59:39.540
it reads, first of all, later on I
link |
01:59:43.540
read the accounts of the doctors
link |
01:59:47.540
who describe the experiences
link |
01:59:51.540
in the book by the
link |
01:59:55.540
patient as incredibly accurate.
link |
01:59:59.540
So I read that there was some doctor saying that
link |
02:00:03.540
every single doctor should read this book to understand
link |
02:00:07.540
what the patient feels. But
link |
02:00:11.540
again, as many of the Solzhenitsyn's
link |
02:00:15.540
books, it has multiple levels of complexity.
link |
02:00:19.540
And obviously if you look above
link |
02:00:23.540
the cancer and the patient, the
link |
02:00:27.540
tumor that was growing and then disappeared
link |
02:00:31.540
in his
link |
02:00:35.540
body with some consequences, this is
link |
02:00:39.540
allegorically the
link |
02:00:43.540
Soviet, and he actually
link |
02:00:47.540
when he was asked, he said that this is what made him
link |
02:00:51.540
think about this, how to combine these experiences.
link |
02:00:55.540
Him being a part of the Soviet regime,
link |
02:00:59.540
also being a part of the
link |
02:01:03.540
someone sent to Gulag camp,
link |
02:01:07.540
and also someone who experienced cancer
link |
02:01:11.540
in his life. The Gulag Archipelago
link |
02:01:15.540
and this book, these are the works that actually made him
link |
02:01:19.540
receive a Nobel Prize. But to me
link |
02:01:23.540
I've read
link |
02:01:27.540
other books by Solzhenitsyn.
link |
02:01:31.540
This one to me is the most powerful one.
link |
02:01:35.540
And by the way, both this one and the previous one you read in Russian?
link |
02:01:39.540
Yes. So now there is the third book is an English book
link |
02:01:43.540
and it's completely different. So we're switching the gears
link |
02:01:47.540
completely. So this is the book which, it's not even
link |
02:01:51.540
a book, it's an essay by
link |
02:01:55.540
Jonathan Neumann called The Computer and the Brain.
link |
02:01:59.540
And that was the book he was writing
link |
02:02:03.540
knowing that he was dying of cancer.
link |
02:02:07.540
So the book was released back, it's a very thin book.
link |
02:02:11.540
But the power,
link |
02:02:15.540
the intellectual power in this book, in this essay
link |
02:02:19.540
is incredible. I mean you probably know that von Neumann
link |
02:02:23.540
is considered to be one of the biggest
link |
02:02:27.540
thinkers. So his intellectual power was incredible.
link |
02:02:31.540
And you can actually feel this power
link |
02:02:35.540
in this book where the person is writing knowing that he will be,
link |
02:02:39.540
he will die. The book actually got published only after his
link |
02:02:43.540
death back in 1958. He died in 1957.
link |
02:02:47.540
So he tried to put as many
link |
02:02:51.540
ideas that he still
link |
02:02:55.540
hadn't realized.
link |
02:02:59.540
So this book is very difficult
link |
02:03:03.540
to read because every single paragraph
link |
02:03:07.540
is just compact, is
link |
02:03:11.540
filled with these ideas. And the ideas are incredible.
link |
02:03:15.540
Even nowadays, so he tried
link |
02:03:19.540
to put the parallels between the brain
link |
02:03:23.540
computing power, the neural system, and the computers
link |
02:03:27.540
as they were understood. Do you remember what year he was working on this?
link |
02:03:31.540
57. 57. So that was right during his,
link |
02:03:35.540
when he was diagnosed with cancer and he was essentially...
link |
02:03:39.540
Yeah, he's one of those, there's a few folks people mention,
link |
02:03:43.540
I think Ed Witten is another that like
link |
02:03:47.540
everyone that meets them, they say he's just an intellectual powerhouse.
link |
02:03:51.540
Yes. Okay, so who's the honorable mention?
link |
02:03:55.540
And this is, I mean, the reason I put it sort of in a separate section
link |
02:03:59.540
because this is a book that I recently
link |
02:04:03.540
listened to. So it's an audio book.
link |
02:04:07.540
And this is a book called Lab Girl by Hope Jarron.
link |
02:04:11.540
So Hope Jarron, she is a
link |
02:04:15.540
scientist, she's a geochemist that essentially
link |
02:04:19.540
studies the
link |
02:04:23.540
fossil plants. And so she uses
link |
02:04:27.540
this fossil plant, the chemical analysis to understand
link |
02:04:31.540
what was the climate back in
link |
02:04:35.540
a thousand years, hundreds of thousands of years ago.
link |
02:04:39.540
And so something that incredibly
link |
02:04:43.540
touched me by this book, it was narrated by the author.
link |
02:04:47.540
Nice. And it's an incredibly
link |
02:04:51.540
personal story, incredibly. So
link |
02:04:55.540
certain parts of the book, you could actually hear the author crying.
link |
02:04:59.540
And that to me, I mean, I never experienced
link |
02:05:03.540
anything like this, reading the book, but it was like
link |
02:05:07.540
the connection between you and the author.
link |
02:05:11.540
And I think this is really
link |
02:05:15.540
a must read, but even better, a must listen
link |
02:05:19.540
to audio book for anyone who
link |
02:05:23.540
wants to learn about sort of
link |
02:05:27.540
academia, science, research in general, because it's
link |
02:05:31.540
a very personal account about her becoming
link |
02:05:35.540
a scientist. So
link |
02:05:39.540
we're just before New Year's.
link |
02:05:43.540
We talked a lot about some difficult topics of viruses and so on.
link |
02:05:47.540
Do you have some exciting things you're looking forward
link |
02:05:51.540
to in 2021? Some New Year's resolutions,
link |
02:05:55.540
maybe silly or fun, or
link |
02:05:59.540
something very important and fundamental to
link |
02:06:03.540
the world of science or something completely unimportant?
link |
02:06:07.540
Well, I'm definitely looking forward to
link |
02:06:11.540
towards things becoming normal.
link |
02:06:15.540
So yes, I really miss traveling.
link |
02:06:19.540
Every summer I go
link |
02:06:23.540
to an international summer school. It's called
link |
02:06:27.540
the School for Molecular and Theoretical Biology. It's held in Europe.
link |
02:06:31.540
It's organized by very good friends of mine. And this is
link |
02:06:35.540
the school for gifted kids from all over the world, and
link |
02:06:39.540
they're incredibly bright. It's like every time I go there, it's like, you know,
link |
02:06:43.540
it's a highlight of the year. And
link |
02:06:47.540
we couldn't make it this August, so we
link |
02:06:51.540
did this school remotely, but it's different.
link |
02:06:55.540
So I am definitely looking forward to next August
link |
02:06:59.540
coming there. One of
link |
02:07:03.540
my personal resolutions, I realized that
link |
02:07:07.540
being in the house and working from home,
link |
02:07:11.540
I realized that actually
link |
02:07:15.540
I apparently missed a lot
link |
02:07:19.540
spending time with my family,
link |
02:07:23.540
believe it or not. So you typically, with all the
link |
02:07:27.540
research and teaching and
link |
02:07:31.540
everything related to the academic life,
link |
02:07:35.540
I mean, you get distracted. And so
link |
02:07:39.540
you don't feel that
link |
02:07:43.540
the fact that you are away from your family doesn't affect you
link |
02:07:47.540
because you are naturally distracted by other things.
link |
02:07:51.540
So this time I realized that
link |
02:07:55.540
that's so important, right? Spending your time with
link |
02:07:59.540
the family, with your kids. And so that
link |
02:08:03.540
would be my new year resolution and actually trying to
link |
02:08:07.540
spend as much time as possible. Even when the world opens up.
link |
02:08:11.540
Yeah, that's a beautiful message. That's a beautiful reminder.
link |
02:08:15.540
I asked you if there's a Russian poem
link |
02:08:19.540
that I could read, that I could force you to read, and you said, okay, fine, sure.
link |
02:08:23.540
Do you mind reading?
link |
02:08:27.540
And you said that no paper needed.
link |
02:08:31.540
So this poem was written by my namesake,
link |
02:08:35.540
another Dmitry, Dmitry Kemerefeld.
link |
02:08:39.540
It's a recent poem and it's
link |
02:08:43.540
called Sorceress, Vyadma,
link |
02:08:47.540
in Russian, or actually
link |
02:08:51.540
Koldunya. So that's sort of another sort of connotation of
link |
02:08:55.540
sorceress or witch. And I really like it
link |
02:08:59.540
and it's one of just a handful poems I actually
link |
02:09:03.540
can recall by heart. I also have a very strong
link |
02:09:07.540
association when I read this poem with Master and
link |
02:09:11.540
Margarita, the main female character,
link |
02:09:15.540
Margarita. And also it's
link |
02:09:19.540
about, it's happening about the same time we're talking
link |
02:09:23.540
now, so around New Year,
link |
02:09:27.540
around Christmas. Do you mind reading it in Russian?
link |
02:09:31.540
I'll give it a try.
link |
02:10:01.540
So you narrowed your eyes,
link |
02:10:05.540
that anyone who was blessed
link |
02:10:09.540
was ready to give their soul to the devil
link |
02:10:13.540
for this witch's connection.
link |
02:10:17.540
And I, without prejudice,
link |
02:10:21.540
ran out to feel your
link |
02:10:25.540
amazing breath on your lips,
link |
02:10:29.540
to remember how you flew above the earth
link |
02:10:33.540
in a white view,
link |
02:10:37.540
in a white haze, in a white mist.
link |
02:10:41.540
That's beautiful. I love how it captures a moment of longing
link |
02:10:45.540
and maybe love even.
link |
02:10:49.540
Yes. To me it has a lot of meaning about
link |
02:10:53.540
this something that is happening,
link |
02:10:57.540
something that is far away, but still very close to you.
link |
02:11:01.540
And yes, it's the winter.
link |
02:11:05.540
There's something magical about winter, isn't there?
link |
02:11:09.540
I don't know how to translate it, but a kiss in winter
link |
02:11:13.540
is interesting. Lips in winter and all that kind of stuff.
link |
02:11:17.540
It's beautiful. Russian has a way. It has a reason, Russian poetry
link |
02:11:21.540
is just, I'm a fan of poetry in both languages, but English
link |
02:11:25.540
doesn't capture some of the magic that Russian seems to, so
link |
02:11:29.540
thank you for doing that. That was awesome. Dmitry,
link |
02:11:33.540
it's great to talk to you again. It's contagious
link |
02:11:37.540
how much you love what you do, how much you love life, so I really appreciate
link |
02:11:41.540
you taking the time to talk today. And thank you for having me.
link |
02:11:45.540
Thanks for listening to this conversation with Dmitry Korkin, and thank you to our
link |
02:11:49.540
sponsors. Brave Browser, NetSuite Business Management
link |
02:11:53.540
Software, Magic Spoon Low Carb Cereal, and
link |
02:11:57.540
Asleep Self Cooling Mattress. So the choice is
link |
02:12:01.540
browsing privacy, business success, healthy diet, or comfortable
link |
02:12:05.540
sleep. Choose wisely, my friends. And if you wish,
link |
02:12:09.540
click the sponsor links below to get a discount and to support this podcast.
link |
02:12:13.540
And now, let me leave you with some words from Jeffrey Eugenides.
link |
02:12:17.540
Biology gives you a brain.
link |
02:12:21.540
Life turns it into a mind. Thank you for listening,
link |
02:12:25.540
and hope to see you next time.