back to index

Manolis Kellis: Biology of Disease | Lex Fridman Podcast #133


small model | large model

link |
00:00:00.000
The following is a conversation with Manolis Kellis, his third time on the podcast.
link |
00:00:05.640
He is a professor at MIT and head of the MIT Computational Biology Group.
link |
00:00:11.380
This time we went deep on the science, biology, and genetics.
link |
00:00:17.160
So this is a bit of an experiment.
link |
00:00:19.920
Manolis went back and forth between the basics of biology to the latest state of the art
link |
00:00:25.120
in the research.
link |
00:00:26.120
He's a master at this, so I just sat back and enjoyed the ride.
link |
00:00:31.240
This conversation happened at 7am, so it's yet another podcast episode after an all nighter
link |
00:00:37.400
for me.
link |
00:00:38.400
And once again, since the universe has a sense of humor, this one was a tough one for my
link |
00:00:44.040
brain to keep up, but I did my best and I never shy away from a good challenge.
link |
00:00:50.480
Quick mention of each sponsor, followed by some thoughts related to the episode.
link |
00:00:55.960
First is SEMrush, the most advanced SEO optimization tool I've ever come across.
link |
00:01:02.160
I don't like looking at numbers, but someone probably should, it helps you make good decisions.
link |
00:01:08.920
Second is Pessimist Archive, they're back, one of my favorite history podcasts on why
link |
00:01:13.920
people resist new things from recorded music to umbrellas to cars, chess, coffee, and the
link |
00:01:20.840
elevator.
link |
00:01:22.460
Third is 8sleep, a mattress that cools itself, measures heart rate variability, has an app,
link |
00:01:28.320
and has given me yet another reason to look forward to sleep, including the all important
link |
00:01:33.160
power nap.
link |
00:01:34.840
And finally, BetterHelp, online therapy when you want to face your demons with a licensed
link |
00:01:40.080
professional, not just by doing the David Goggins like physical challenges like I seem
link |
00:01:45.440
to do on occasion.
link |
00:01:47.960
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
00:01:54.200
As a side note, let me say that biology in the brain and in the various systems of the
link |
00:01:59.280
body fill me with awe every time I think about how such a chaotic mess coming from its humble
link |
00:02:05.800
origins in the ocean was able to achieve such incredibly complex and robust mechanisms of
link |
00:02:11.940
life that survived despite all the forces of nature that want to destroy it.
link |
00:02:17.920
It is so unlike the computing systems we humans have engineered that it makes me feel that
link |
00:02:22.920
in order to create artificial general intelligence and artificial consciousness, we may have
link |
00:02:28.420
to completely rethink how we engineer computational systems.
link |
00:02:33.560
If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast, follow
link |
00:02:38.520
on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
link |
00:02:44.880
And now, here's my conversation with Manolis Callas.
link |
00:02:49.760
So your group at MIT is trying to understand the molecular basis of human disease.
link |
00:02:54.880
What are some of the biggest challenges in your view?
link |
00:02:57.720
Don't get me started.
link |
00:02:58.720
I mean, understanding human disease is the most complex challenge in modern science.
link |
00:03:06.240
So because human disease is as complex as the human genome, it is as complex as the
link |
00:03:13.340
human brain, and it is in many ways, even more complex because the more we understand
link |
00:03:20.680
disease complexity, the more we start understanding genome complexity and epigenome complexity
link |
00:03:27.000
and brain circuitry complexity and immune system complexity and cancer complexity and
link |
00:03:31.200
so on and so forth.
link |
00:03:32.280
So traditionally, human disease was following basic biology.
link |
00:03:39.400
You would basically understand basic biology in model organisms like, you know, mouse and
link |
00:03:44.360
fly and yeast.
link |
00:03:46.040
You would understand sort of mammalian biology and animal biology and eukaryotic biology
link |
00:03:53.400
in sort of progressive layers of complexity, getting closer to human phylogenetically.
link |
00:03:59.920
And you would do perturbation experiments in those species to see if I knock out a gene,
link |
00:04:06.640
what happens?
link |
00:04:07.960
And based on the knocking out of these genes, you would basically then have a way to drive
link |
00:04:12.720
human biology because you would sort of understand the functions of these genes.
link |
00:04:16.880
And then if you find that a human gene locus, something that you've mapped from human genetics
link |
00:04:23.660
to that gene is related to a particular human disease, you'd say, aha, now I know the function
link |
00:04:28.760
of the gene from the model organisms.
link |
00:04:31.440
I can now go and understand the function of that gene in human.
link |
00:04:37.120
But this is all changing.
link |
00:04:38.320
This is dramatically changed.
link |
00:04:39.440
So that was the old way of doing basic biology.
link |
00:04:41.760
You would start with the animal models, the eukaryotic models, the mammalian models, and
link |
00:04:46.280
then you would go to human.
link |
00:04:48.680
Human genetics has been so transformed in the last decade or two that human genetics
link |
00:04:55.320
is now actually driving the basic biology.
link |
00:04:58.420
There is more genetic mutation information in the human genome than there will ever be
link |
00:05:04.300
in any other species.
link |
00:05:06.240
What do you mean by mutation information?
link |
00:05:08.440
So perturbations is how you understand systems.
link |
00:05:11.380
So an engineer builds systems and then they know how they work from the inside out.
link |
00:05:16.120
A scientist studies systems through perturbations.
link |
00:05:20.360
You basically say, if I poke that balloon, what's going to happen?
link |
00:05:23.040
And I'm going to film it in super high resolution, understand, I don't know, aerodynamics or
link |
00:05:26.480
fluid dynamics if it's filled with water, et cetera.
link |
00:05:28.840
So you can then make experimentation by perturbation and then the scientific process is sort of
link |
00:05:33.960
building models that best fit the data, designing new experiments that best test your models
link |
00:05:41.120
and challenge your models and so on and so forth.
link |
00:05:43.320
This is the same thing with science.
link |
00:05:44.800
Basically if you're trying to understand biological science, you basically want to do perturbations
link |
00:05:49.440
that then drive the models.
link |
00:05:54.600
So how do these perturbations allow you to understand disease?
link |
00:05:58.320
So if you know that a gene is related to disease, you don't want to just know that it's related
link |
00:06:04.120
to the disease.
link |
00:06:05.120
You want to know what is the disease mechanism because you want to go and intervene.
link |
00:06:09.960
So the way that I like to describe it is that traditionally epidemiology, which is basically
link |
00:06:17.240
the study of disease, you know, sort of the observational study of disease has been about
link |
00:06:23.000
correlating one thing with another thing.
link |
00:06:25.760
So if you have a lot of people with liver disease who are also alcoholics, you might
link |
00:06:29.880
say, well, maybe the alcoholism is driving the liver disease or maybe those who have
link |
00:06:34.360
liver disease self medicate with alcohol.
link |
00:06:36.780
So the connection could be either way.
link |
00:06:40.120
With genetic epidemiology, it's about correlating changes in genome with phenotypic differences
link |
00:06:47.640
and then you know the direction of causality.
link |
00:06:50.120
So if you know that a particular gene is related to the disease, you can basically say, okay,
link |
00:06:58.120
perturbing that gene in mouse causes the mice to have X phenotype.
link |
00:07:03.860
So perturbing that gene in human causes the humans to have the disease.
link |
00:07:08.240
So I can now figure out what are the detailed molecular phenotypes in the human that are
link |
00:07:14.680
related to that organismal phenotype in the disease.
link |
00:07:18.820
So it's all about understanding disease mechanism, understanding what are the pathways, what
link |
00:07:22.960
are the tissues, what are the processes that are associated with the disease so that we
link |
00:07:27.560
know how to intervene.
link |
00:07:29.000
You can then prescribe particular medications that also alter these processes.
link |
00:07:33.360
You can prescribe lifestyle changes that also affect these processes and so on and so forth.
link |
00:07:37.920
That's such a beautiful puzzle to try to solve.
link |
00:07:41.040
Like what kind of perturbations eventually have this ripple effect that leads to disease
link |
00:07:45.480
across the population.
link |
00:07:46.480
And then you study that for animals or mice first and then see how that might possibly
link |
00:07:51.880
connect to humans.
link |
00:07:54.680
How hard is that puzzle of trying to figure out how little perturbations might lead to,
link |
00:08:01.340
in a stable way, to a disease?
link |
00:08:04.200
In animals, we make the puzzle simpler because we perturb one gene at a time.
link |
00:08:11.040
That's the beauty of this, the power of animal models.
link |
00:08:13.500
You can basically decouple the perturbations.
link |
00:08:15.800
You only do one perturbation and you only do strong perturbations at a time.
link |
00:08:21.240
In human, the puzzle is incredibly complex because obviously you don't do human experimentation.
link |
00:08:28.600
You wait for natural selection and natural genetic variation to basically do its own
link |
00:08:34.720
experiments, which it has been doing for hundreds and thousands of years in the human population
link |
00:08:40.560
and for hundreds of thousands of years across the history leading to the human population.
link |
00:08:49.320
So you basically take this natural genetic variation that we all carry within us.
link |
00:08:54.440
Every one of us carries 6 million perturbations.
link |
00:08:58.280
So I've done 6 million experiments on you, 6 million experiments on me, 6 million experiments
link |
00:09:02.920
on every one of 7 billion people on the planet.
link |
00:09:06.400
What's the 6 million correspond to?
link |
00:09:08.600
6 million unique genetic variants that are segregating in the human population.
link |
00:09:14.840
Every one of us carries millions of polymorphic sites, poly, many, morph, forms.
link |
00:09:22.880
Polymorphic means many forms, variants.
link |
00:09:25.080
That basically means that every one of us has single nucleotide alterations that we
link |
00:09:29.560
have inherited from mom and from dad that basically can be thought of as tiny little
link |
00:09:34.680
perturbations.
link |
00:09:36.320
Most of them don't do anything, but some of them lead to all of the phenotypic differences
link |
00:09:42.480
that we see between us.
link |
00:09:43.920
The reason why two twins are identical is because these variants completely determine
link |
00:09:48.960
the way that I'm going to look at exactly 93 years of age.
link |
00:09:52.520
How happy are you with this kind of data set?
link |
00:09:54.720
Is it large enough of the human population of Earth?
link |
00:09:59.240
Is that too big, too small?
link |
00:10:01.680
Yeah, so is it large enough is a power analysis question.
link |
00:10:07.360
In every one of our grants, we do a power analysis based on what is the effect size
link |
00:10:11.440
that I would like to detect and what is the natural variation in the two forms.
link |
00:10:19.200
Every time you do a perturbation, you're asking, I'm changing form A into form B. Form A has
link |
00:10:25.240
some natural phenotypic variation around it and form B has some natural phenotypic variation
link |
00:10:30.160
around it.
link |
00:10:31.240
If those variances are large and the differences between the mean of A and the mean of B are
link |
00:10:36.200
small, then you have very little power.
link |
00:10:38.920
The further the means go apart, that's the effect size, the more power you have, and
link |
00:10:44.600
the smaller the standard deviation, the more power you have.
link |
00:10:48.760
So basically when you're asking, is that sufficiently large, certainly not for everything, but we
link |
00:10:54.440
already have enough power for many of the stronger effects in the more tight distributions.
link |
00:11:01.240
So that's the hopeful message that there exists parts of the genome that have a strong effect
link |
00:11:09.840
that has a small variance.
link |
00:11:13.200
That's exactly right.
link |
00:11:14.200
Unfortunately, those perturbations are the basis of disease in many cases.
link |
00:11:18.400
So it's not a hopeful message.
link |
00:11:20.800
Sometimes it's a terrible message.
link |
00:11:22.720
It's basically, well, some people are sick, but if we can figure out what are these contributors
link |
00:11:27.880
to sickness, we can then help make them better and help many other people better who don't
link |
00:11:32.760
carry that exact mutation, but who carry mutations on the same pathways.
link |
00:11:38.960
And that's what we like to call the allelic series of a gene.
link |
00:11:42.800
You basically have many perturbations of the same gene in different people, each with a
link |
00:11:49.580
different frequency in the human population and each with a different effect on the individual
link |
00:11:55.200
that carries them.
link |
00:11:56.200
So you said in the past there would be these small experiments on perturbations and animal
link |
00:12:03.000
models.
link |
00:12:04.000
What does this puzzle solving process look like today?
link |
00:12:08.400
So we basically have something like 7 billion people in the planet and every one of them
link |
00:12:13.180
carries something like 6 million mutations.
link |
00:12:16.760
You basically have an enormous matrix of genotype by phenotype by systematically measuring the
link |
00:12:25.000
phenotype of these individuals.
link |
00:12:27.960
And the traditional way of measuring this phenotype has been to look at one trait at
link |
00:12:32.640
a time.
link |
00:12:33.700
You would gather families and you would sort of paint the pedigrees of a strong effect,
link |
00:12:40.160
what we like to call Mendelian mutation, so a mutation that gets transmitted in a dominant
link |
00:12:47.320
or a recessive, but strong effect form where basically one locus plays a very big role
link |
00:12:53.240
in that disease.
link |
00:12:54.240
And you could then look at carriers versus non carriers in one family, carriers versus
link |
00:12:59.560
non carriers in another family and do that for hundreds, sometimes thousands of families
link |
00:13:04.480
and then trace these inheritance patterns and then figure out what is the gene that
link |
00:13:08.360
plays that role.
link |
00:13:09.500
Is this the matrix that you're showing in talks or lectures?
link |
00:13:14.420
So that matrix is the input to those stuff that I show in talks.
link |
00:13:21.000
So basically that matrix has traditionally been strong effect genes.
link |
00:13:24.980
What the matrix looks like now is instead of pedigrees, instead of families, you basically
link |
00:13:29.880
have thousands and sometimes hundreds of thousands of unrelated individuals, each with all of
link |
00:13:36.720
their genetic variants and each with their phenotype, for example, height or lipids or,
link |
00:13:43.520
you know, whether they're sick or not for a particular trait.
link |
00:13:48.080
That has been the modern view instead of going to families, going to unrelated individuals
link |
00:13:53.080
with one phenotype at a time.
link |
00:13:55.760
And what we're doing now as we're maturing in all of these sciences is that we're doing
link |
00:14:00.960
this in the context of large medical systems or enormous cohorts that are very well phenotyped
link |
00:14:07.960
across hundreds of phenotypes, sometimes with our complete electronic health record.
link |
00:14:13.720
So you can now start relating not just one gene segregating one family, not just thousands
link |
00:14:19.640
of variants segregating with one phenotype, but now you can do millions of variants versus
link |
00:14:25.080
hundreds of phenotypes.
link |
00:14:27.120
And as a computer scientist, I mean, deconvolving that matrix, partitioning it into the layers
link |
00:14:33.880
of biology that are associated with every one of these elements is a dream come true.
link |
00:14:40.160
It's like the world's greatest puzzle.
link |
00:14:42.840
And you can now solve that puzzle by throwing in more and more knowledge about the function
link |
00:14:50.120
of different genomic regions and how these functions are changed across tissues and in
link |
00:14:56.520
the context of disease.
link |
00:14:58.100
And that's what my group and many other groups are doing.
link |
00:15:00.720
We're trying to systematically relate this genetic variation with molecular variation
link |
00:15:05.760
at the expression level of the genes, at the epigenomic level of the gene regulatory circuitry,
link |
00:15:12.700
and at the cellular level of what are the functions that are happening in those cells,
link |
00:15:17.020
at the single cell level using single cell profiling, and then relate all that vast amount
link |
00:15:22.340
of knowledge computationally with the thousands of traits that each of these of thousands
link |
00:15:29.160
of variants are perturbing.
link |
00:15:30.800
I mean, this is something we talked about, I think last time.
link |
00:15:34.280
So there's these effects at different levels that happen.
link |
00:15:36.480
You said at a single cell level, you're trying to see things that happen due to certain perturbations.
link |
00:15:42.800
And then it's not just like a puzzle of perturbation and disease.
link |
00:15:49.560
It's perturbation then effect at a cellular level, then at an organ level, a body, like,
link |
00:15:57.660
how do you disassemble this into like what your group is working on?
link |
00:16:02.760
You're basically taking a bunch of the hard problems in the space.
link |
00:16:06.560
How do you break apart a difficult disease and break it apart into problems that you,
link |
00:16:13.520
into puzzles that you can now start solving?
link |
00:16:15.520
So there's a struggle here.
link |
00:16:17.380
Super scientists love hard puzzles and they're like, oh, I want to build a method that just
link |
00:16:22.120
deconvolves the whole thing computationally.
link |
00:16:24.920
And that's very tempting and it's very appealing, but biologists just like to decouple that
link |
00:16:31.640
complexity experimentally, to just like peel off layers of complexity experimentally.
link |
00:16:36.080
And that's what many of these modern tools that my group and others have both developed
link |
00:16:40.380
and used.
link |
00:16:41.600
The fact that we can now figure out tricks for peeling off these layers of complexity
link |
00:16:46.760
by testing one cell type at a time or by testing one cell at a time.
link |
00:16:53.080
And you could basically say, what is the effect of these genetic variants associated with
link |
00:16:56.380
Alzheimer's on human brain?
link |
00:16:59.360
Human brain sounds like, oh, it's an organ, of course, just go one organ at a time.
link |
00:17:04.320
But human brain has of course, dozens of different brain regions and within each of these brain
link |
00:17:09.500
regions, dozens of different cell types and every single type of neuron, every single
link |
00:17:15.080
type of glial cell between astrocytes, oligodendrocytes, microglia, between all of the neural cells
link |
00:17:24.160
and the vascular cells and the immune cells that are co inhabiting the brain between the
link |
00:17:29.880
different types of excitatory and inhibitory neurons that are sort of interacting with
link |
00:17:34.440
each other between different layers of neurons in the cortical layers.
link |
00:17:39.300
Every single one of these has a different type of function to play in cognition, in
link |
00:17:47.920
interaction with the environment, in maintenance of the brain, in energetic needs, in feeding
link |
00:17:55.280
the brain with blood, with oxygen, in clearing out the debris that are resulting from the
link |
00:18:01.680
super high energy production of cognition in humans.
link |
00:18:06.940
So all of these things are basically potentially deconvolvable computationally, but experimentally,
link |
00:18:17.040
you can just do single cell profiling of dozens of regions of the brain across hundreds of
link |
00:18:21.640
individuals across millions of cells.
link |
00:18:24.640
And then now you have pieces of the puzzle that you can then put back together to understand
link |
00:18:31.440
that complexity.
link |
00:18:32.440
I mean, first of all, the cells in the human brain are the most, maybe I'm romanticizing
link |
00:18:39.400
it, but cognition seems to be very complicated.
link |
00:18:42.520
So separating into the function, breaking Alzheimer's down to the cellular level seems
link |
00:18:53.520
very challenging.
link |
00:18:56.340
Is that basically you're trying to find a way that some perturbation in the genome results
link |
00:19:05.200
in some obvious major dysfunction in the cell.
link |
00:19:11.920
You're trying to find something like that.
link |
00:19:14.400
Exactly.
link |
00:19:15.400
So what does human genetics do?
link |
00:19:17.120
Human genetics basically looks at the whole path from genetic variation all the way to
link |
00:19:21.640
disease.
link |
00:19:22.640
So human genetics has basically taken thousands of Alzheimer's cases and thousands of controls
link |
00:19:31.640
matched for age, for sex, for environmental backgrounds and so on and so forth.
link |
00:19:38.440
And then looked at that map where you're asking, what are the individual genetic perturbations
link |
00:19:44.500
and how are they related to all the way to Alzheimer's disease?
link |
00:19:48.520
And that has actually been quite successful.
link |
00:19:51.280
So we now have more than 27 different loci, these are genomic regions that are associated
link |
00:19:57.680
with Alzheimer's at these end to end level.
link |
00:20:02.420
But the moment you sort of break up that very long path into smaller levels, you can basically
link |
00:20:07.400
say from genetics, what are the epigenomic alterations at the level of gene regulatory
link |
00:20:13.480
elements where that genetic variant perturbs the control region nearby.
link |
00:20:19.160
That effect is much larger.
link |
00:20:21.840
You mean much larger in terms of this down the line impact or?
link |
00:20:25.480
It's much larger in terms of the measurable effect, this A versus B variance is actually
link |
00:20:31.120
so much cleanly defined when you go to the shorter branches.
link |
00:20:35.800
Because for one genetic variant to affect Alzheimer's, that's a very long path.
link |
00:20:40.760
That basically means that in the context of millions of these 6 million variants that
link |
00:20:43.940
every one of us carries, that one single nucleotide has a detectable effect all the way to the
link |
00:20:51.040
end.
link |
00:20:52.040
I mean, it's just mind boggling that that's even possible, but indeed there are such effects.
link |
00:20:57.700
So the hope is, or the most scientifically speaking, the most effective place where to
link |
00:21:03.000
detect the alteration that results in disease is earlier on in the pipeline, as early as
link |
00:21:10.640
possible.
link |
00:21:11.640
It's a trade off.
link |
00:21:12.680
If you go very early on in the pipeline, now each of these epigenomic alterations, for
link |
00:21:17.800
example, this enhancer control region is active maybe 50% less, which is a dramatic effect.
link |
00:21:25.500
Now you can ask, well, how much does changing one regulatory region in the genome in one
link |
00:21:29.680
cell type change disease?
link |
00:21:31.280
Well, that path is now long.
link |
00:21:33.920
So if you instead look at expression, the path between genetic variation and the expression
link |
00:21:39.680
of one gene goes through many enhancer regions, and therefore it's a subtler effect at the
link |
00:21:44.560
gene level.
link |
00:21:45.560
But then now you're closer because one gene is acting in the context of only 20,000 other
link |
00:21:51.360
genes as opposed to one enhancer acting in the context of 2 million other enhancers.
link |
00:21:57.200
So you basically now have genetic, epigenomic, the circuitry, transcriptomic, the gene expression
link |
00:22:04.040
control, and then cellular, where you can basically say, I can measure various properties
link |
00:22:09.600
of those cells.
link |
00:22:11.160
What is the calcium influx rate when I have this genetic variation?
link |
00:22:17.560
What is the synaptic density?
link |
00:22:19.760
What is the electric impulse conductivity and so on and so forth?
link |
00:22:24.500
So you can measure things along this path to disease, and you can also measure endophenotypes.
link |
00:22:32.660
You can basically measure your brain activity.
link |
00:22:37.460
You can do imaging in the brain.
link |
00:22:39.680
You can basically measure, I don't know, the heart rate, the pulse, the lipids, the amount
link |
00:22:44.440
of blood secreted and so on and so forth.
link |
00:22:46.700
And then through all of that, you can basically get at the path to causality, the path to
link |
00:22:52.760
disease.
link |
00:22:55.320
And is there something beyond cellular?
link |
00:22:57.680
So you mentioned lifestyle interventions or changes as a way to, or like be able to prescribe
link |
00:23:05.480
changes in lifestyle.
link |
00:23:07.840
Like what about organs?
link |
00:23:09.360
What about like the function of the body as a whole?
link |
00:23:13.200
Yeah, absolutely.
link |
00:23:14.200
So basically when you go to your doctor, they always measure, you know, your pulse.
link |
00:23:18.200
They always measure your height.
link |
00:23:19.200
They always measure your weight, you know, your BMI.
link |
00:23:21.240
So basically these are just very basic variables.
link |
00:23:24.180
But with digital devices nowadays, you can start measuring hundreds of variables for
link |
00:23:27.960
every individual.
link |
00:23:29.600
You can basically also phenotype cognitively through tests, Alzheimer's patients.
link |
00:23:37.300
There are cognitive tests that you can measure, that you typically do for cognitive decline,
link |
00:23:43.720
these mini mental observations that you have specific questions to.
link |
00:23:48.500
You can think of sort of enlarging the set of cognitive tests.
link |
00:23:51.980
So in the mouse, for example, you do experiments for how do they get out of mazes?
link |
00:23:55.760
How do they find food?
link |
00:23:57.280
Whether they recall a fear, whether they shake in a new environment and so on and so forth.
link |
00:24:02.440
In the human, you can have much, much richer phenotypes where you can basically say not
link |
00:24:06.560
just imaging at the organ level and all kinds of other activities at the organ level, but
link |
00:24:13.920
you can also do at the organism level, you can do behavioral tests.
link |
00:24:19.480
And how did they do on empathy?
link |
00:24:21.120
How did they do on memory?
link |
00:24:22.920
How did they do on longterm memory versus short term memory?
link |
00:24:26.160
And so on and so forth.
link |
00:24:27.160
I love how you're calling that phenotype.
link |
00:24:28.760
I guess it is.
link |
00:24:29.760
It is.
link |
00:24:31.040
But like your behavior patterns that might change over a period of a life, your ability
link |
00:24:37.880
to remember things, your ability to be empathetic or emotionally, your intelligence perhaps
link |
00:24:44.560
even.
link |
00:24:45.560
Yeah, but intelligence has hundreds of variables.
link |
00:24:47.160
You can be your math intelligence, your literary intelligence, your puzzle solving intelligence,
link |
00:24:50.720
your logic.
link |
00:24:51.720
It could be like hundreds of things.
link |
00:24:52.880
And all of that, we're able to measure that better and better and all that could be connected
link |
00:24:57.440
to the entire pipeline somehow.
link |
00:24:58.920
We used to think of each of these as a single variable like intelligence.
link |
00:25:01.840
I mean, that's ridiculous.
link |
00:25:03.380
It's basically dozens of different genes that are controlling every single variable.
link |
00:25:10.880
You can basically think of, imagine us in a video game where every one of us has measures
link |
00:25:16.040
of strength, stamina, energy left and so on and so forth.
link |
00:25:20.960
But you could click on each of those five bars that are just the main bars and each
link |
00:25:24.440
of those will just give you then hundreds of bars and can basically say, okay, great
link |
00:25:28.560
for my machine learning task, I want someone who, a human who has these particular forms
link |
00:25:36.200
of intelligence.
link |
00:25:37.200
I require now these 20 different things.
link |
00:25:40.620
And then you can combine those things and then relate them to of course performance
link |
00:25:45.000
in a particular task, but you can also relate them to genetic variation that might be affecting
link |
00:25:50.820
different parts of the brain.
link |
00:25:52.800
For example, your frontal cortex versus your temporal cortex versus your visual cortex
link |
00:25:56.600
and so on and so forth.
link |
00:25:58.040
So genetic variation that affects expression of genes in different parts of your brain
link |
00:26:02.520
can basically affect your music ability, your auditory ability, your smell, just dozens
link |
00:26:08.920
of different phenotypes can be broken down into hundreds of cognitive variables and then
link |
00:26:15.980
relate each of those to thousands of genes that are associated with them.
link |
00:26:20.520
So somebody who loves RPGs or playing games, there's too few variables that we can control.
link |
00:26:28.440
So I'm excited if we're in fact living in a simulation and this is a video game, I'm
link |
00:26:32.680
excited by the quality of the video game.
link |
00:26:37.240
The game designer did a hell of a good job.
link |
00:26:39.760
So we're impressed.
link |
00:26:40.760
Oh, I don't know.
link |
00:26:41.760
The sunset last night was a little unrealistic.
link |
00:26:43.800
Yeah.
link |
00:26:44.800
Yeah.
link |
00:26:45.800
The graphics.
link |
00:26:46.800
Exactly.
link |
00:26:47.800
Come on, NVIDIA.
link |
00:26:48.800
To zoom back out, we've been talking about the genetic origins of diseases, but I think
link |
00:26:54.480
it's fascinating to talk about what are the most important diseases to understand and
link |
00:27:01.080
especially as it connects to the things that you're working on.
link |
00:27:05.320
So it's very difficult to think about important diseases to understand.
link |
00:27:08.840
There's many metrics of importance.
link |
00:27:10.500
One is lifestyle impact.
link |
00:27:12.360
I mean, if you look at COVID, the impact on lifestyle has been enormous.
link |
00:27:16.440
So understanding COVID is important because it has impacted the wellbeing in terms of
link |
00:27:23.080
ability to have a job, ability to have an apartment, ability to go to work, ability
link |
00:27:27.280
to have a mental circle of support and all of that for millions of Americans, like huge,
link |
00:27:34.480
huge impact.
link |
00:27:35.520
So that's one aspect of importance.
link |
00:27:37.000
So basically mental disorders, Alzheimer's has a huge importance in the wellbeing of
link |
00:27:42.480
Americans.
link |
00:27:44.040
Whether or not it kills someone for many, many years, it has a huge impact.
link |
00:27:48.220
So the first measure of importance is just wellbeing.
link |
00:27:52.360
Impact on the quality of life.
link |
00:27:53.780
Impact on the quality of life, absolutely.
link |
00:27:55.860
The second metric, which is much easier to quantify is deaths.
link |
00:28:00.160
What is the number one killer?
link |
00:28:01.920
The number one killer is actually heart disease.
link |
00:28:04.760
It is actually killing 650,000 Americans per year.
link |
00:28:10.700
Number two is cancer with 600,000 Americans.
link |
00:28:14.280
Number three, far, far down the list is accidents, every single accident combined.
link |
00:28:19.600
So basically you read the news, accidents, like there was a huge car crash all over the
link |
00:28:24.720
news.
link |
00:28:25.800
But the number of deaths, number three by far, 167,000.
link |
00:28:31.320
Core respiratory disease.
link |
00:28:32.800
So that's asthma, not being able to breathe and so on and so forth, 160,000 Alzheimer's
link |
00:28:39.160
number five with 120,000 and then stroke, brain aneurysms and so on and so forth, that's
link |
00:28:45.040
147,000 diabetes and metabolic disorders, et cetera.
link |
00:28:49.720
That's 85,000.
link |
00:28:51.140
The flu is 60,000, suicide, 50,000 and then overdose, et cetera, you know, goes further
link |
00:28:58.960
down the list.
link |
00:29:00.120
So of course COVID has creeped up to be the number three killer this year with, you know,
link |
00:29:06.620
more than 100,000 Americans and counting.
link |
00:29:11.360
And you know, but if you think about sort of what do we use, what are the most important
link |
00:29:16.560
diseases, you have to understand both the quality of life and the sheer number of deaths
link |
00:29:22.720
and just numbers of years lost if you wish.
link |
00:29:25.560
And each of these diseases you can think of as, and also including terrorist attacks and
link |
00:29:30.960
school shootings, for example, things which lead to fatalities, you can look at as problems
link |
00:29:39.200
that could be solved.
link |
00:29:41.480
And some problems are harder to solve than others.
link |
00:29:44.080
I mean, that's part of the equation.
link |
00:29:46.860
So maybe if you look at these diseases, if you look at heart disease or cancer or Alzheimer's
link |
00:29:52.960
or just like schizophrenia and obesity, Debbie, like not necessarily things that kill you,
link |
00:29:59.800
but affect the quality of life, which problems are solvable, which aren't, which are harder
link |
00:30:05.480
to solve, which aren't.
link |
00:30:07.280
I love your question because he puts it in the context of a global effort rather than
link |
00:30:13.720
just the local effort.
link |
00:30:15.120
So basically if you look at the global aspect, exercise and nutrition are two interventions
link |
00:30:22.560
that we can as a society make a much better job at.
link |
00:30:27.040
So if you think about sort of the availability of cheap food, it's extremely high in calories.
link |
00:30:33.080
It's extremely detrimental for you, like a lot of processed food, et cetera.
link |
00:30:36.960
So if we change that equation and as a society, we made availability of healthy food much,
link |
00:30:43.120
much easier and charged a burger at McDonald's, the price that it costs on the health system,
link |
00:30:52.520
then people would actually start buying more healthy foods.
link |
00:30:56.360
So basically that's sort of a societal intervention, if you wish.
link |
00:30:59.600
In the same way, increasing empathy, increasing education, increasing the social framework
link |
00:31:06.580
and support would basically lead to fewer suicides.
link |
00:31:10.040
It would lead to fewer murders.
link |
00:31:11.880
It would lead to fewer deaths overall.
link |
00:31:15.880
So that's something that we as a society can do.
link |
00:31:19.100
You can also think about external factors versus internal factors.
link |
00:31:21.940
So the external factors are basically communicable diseases like COVID, like the flu, et cetera.
link |
00:31:27.440
And the internal factors are basically things like cancer and Alzheimer's where basically
link |
00:31:33.200
your genetics will eventually drive you there.
link |
00:31:38.560
And then of course, with all of these factors, every single disease has both the genetic
link |
00:31:43.520
component and environmental component.
link |
00:31:46.200
So heart disease, huge genetic contribution, Alzheimer's, it's like 60% plus genetic.
link |
00:31:55.700
So I think it's like 79% heritability.
link |
00:31:59.040
So that basically means that genetics alone explains 79% of Alzheimer's incidents.
link |
00:32:06.400
And yes, there's a 21% environmental component where you could basically enrich your cognitive
link |
00:32:14.040
environment, enrich your social interactions, read more books, learn a foreign language,
link |
00:32:21.040
go running, you know, sort of have a more fulfilling life.
link |
00:32:24.800
All of that will actually decrease Alzheimer's, but there's a limit to how much that can impact
link |
00:32:29.320
because of the huge genetic footprint.
link |
00:32:31.240
So this is fascinating.
link |
00:32:32.240
So each one of these problems have a genetic component and an environment component.
link |
00:32:38.860
And so like when there's a genetic component, what can we do about some of these diseases?
link |
00:32:43.520
And have you worked on what can you say that's in terms of problems that are solvable here
link |
00:32:48.520
or understandable?
link |
00:32:50.520
So my group works on the genetic component, but I would argue that understanding the genetic
link |
00:32:55.740
component can have a huge impact even on the environmental component.
link |
00:32:59.700
Why is that?
link |
00:33:00.800
Because genetics gives us access to mechanism.
link |
00:33:03.560
And if we can alter the mechanism, if we can impact the mechanism, we can perhaps counteract
link |
00:33:09.580
some of the environmental components.
link |
00:33:12.080
So understanding the biological mechanisms leading to disease is extremely important
link |
00:33:18.240
in being able to intervene.
link |
00:33:20.820
But when you can intervene and what, you know, the analogy that I like to give is for example,
link |
00:33:26.040
for obesity, you know, think of it as a giant bathtub of fat.
link |
00:33:29.880
There's basically fat coming in from your diet and there's fat coming out from your
link |
00:33:35.800
exercise.
link |
00:33:36.800
Okay.
link |
00:33:37.800
So that's an in out equation and that's the equation that everybody's focusing on.
link |
00:33:42.240
But your metabolism impacts that, you know, bathtub.
link |
00:33:47.740
Basically your metabolism controls the rate at which you're burning energy.
link |
00:33:53.080
It controls the rate at which you're storing energy.
link |
00:33:56.640
And it also teaches you about the various valves that control the input and the output
link |
00:34:02.800
equation.
link |
00:34:04.020
So if we can learn from the genetics, the valves, we can then manipulate those valves.
link |
00:34:11.320
And even if the environment is feeding you a lot of fat and getting a little that out,
link |
00:34:16.060
you can just poke another hole at the bathtub and just get a lot of the fat out.
link |
00:34:19.840
Yeah, that's fascinating.
link |
00:34:21.160
Yeah.
link |
00:34:22.160
So we're not just passive observers of our genetics.
link |
00:34:25.840
The more we understand, the more we can come up with actual treatments.
link |
00:34:29.640
And I think that's an important aspect to realize when people are thinking about strong
link |
00:34:35.680
effect versus weak effect variants.
link |
00:34:38.080
So some variants have strong effects.
link |
00:34:39.580
We talked about these Mendelian disorders where a single gene has a sufficiently large
link |
00:34:43.400
effect, penetrance, expressivity, and so on and so forth, that basically you can trace
link |
00:34:49.420
it in families with cases and not cases, cases, not cases, and so on and so forth.
link |
00:34:55.320
But so these are the genes that everybody says, oh, that's the genes we should go after
link |
00:35:02.840
because that's a strong effect gene.
link |
00:35:04.880
I like to think about it slightly differently.
link |
00:35:06.860
These are the genes where genetic impacts that have a strong effect were tolerated because
link |
00:35:15.440
every single time we have a genetic association with disease, it depends on two things.
link |
00:35:20.200
Number one, the obvious one, whether the gene has an impact on the disease.
link |
00:35:24.680
Number two, the more subtle one is whether there is genetic variation standing and circulating
link |
00:35:32.180
and segregating in the human population that impacts that gene.
link |
00:35:37.680
Some genes are so darn important that if you mess with them, even a tiny little amount,
link |
00:35:44.480
that person's dead.
link |
00:35:46.440
So those genes don't have variation.
link |
00:35:49.020
You're not going to find a genetic association if you don't have variation.
link |
00:35:53.040
That doesn't mean that the gene has no role.
link |
00:35:55.400
It simply means that the gene tolerates no mutations.
link |
00:35:59.120
So that's actually a strong signal when there's no variation.
link |
00:36:01.480
That's so fascinating.
link |
00:36:02.480
Exactly.
link |
00:36:03.480
Genes that have very little variation are hugely important.
link |
00:36:06.780
You can actually rank the importance of genes based on how little variation they have.
link |
00:36:10.840
And those genes that have very little variation but no association with disease, that's a
link |
00:36:16.920
very good metric to say, oh, that's probably a developmental gene because we're not good
link |
00:36:20.440
at measuring those phenotypes.
link |
00:36:22.840
So it's genes that you can tell evolution has excluded mutations from, but yet we can't
link |
00:36:29.040
see them associated with anything that we can measure nowadays.
link |
00:36:32.120
It's probably early embryonic lethal.
link |
00:36:34.840
What are all the words you just said?
link |
00:36:36.200
Early embryonic what?
link |
00:36:37.760
Lethal.
link |
00:36:38.760
Meaning?
link |
00:36:39.760
Meaning that that embryo will die.
link |
00:36:40.760
Okay.
link |
00:36:41.760
There's a bunch of stuff that is required for a stable functional organism across the
link |
00:36:49.160
board for an entire species, I guess.
link |
00:36:53.880
If you look at sperm, it expresses thousands of proteins.
link |
00:36:58.680
Does sperm actually need thousands of proteins?
link |
00:37:01.240
No, but it's probably just testing them.
link |
00:37:05.320
So my speculation is that misfolding of these proteins is an early test for failure.
link |
00:37:11.960
So that out of the millions of sperm that are possible, you select the subset that are
link |
00:37:18.440
just not grossly misfolding thousands of proteins.
link |
00:37:21.920
So it's kind of an assert that this is folded correctly.
link |
00:37:25.720
Correct.
link |
00:37:26.720
Yeah.
link |
00:37:27.720
This just because if this little thing about the folding of a protein isn't correct, that
link |
00:37:32.560
probably means somewhere down the line, there's a bigger issue.
link |
00:37:35.720
That's exactly right.
link |
00:37:36.720
So fail fast.
link |
00:37:37.720
So basically if you look at the mammalian investment in a newborn, that investment is
link |
00:37:45.100
enormous in terms of resources.
link |
00:37:47.720
So mammals have basically evolved mechanisms for fail fast.
link |
00:37:52.840
Where basically in those early months of development, I mean it's horrendous of course at the personal
link |
00:37:58.880
level when you lose your future child, but in some ways there's so little hope for that
link |
00:38:08.680
child to develop and sort of make it through the remaining months that sort of fail fast
link |
00:38:12.880
is probably a good evolutionary principle for mammals.
link |
00:38:19.560
And of course humans have a lot of medical resources that you can sort of give those
link |
00:38:24.920
children a chance and we have so much more success in sort of giving folks who have these
link |
00:38:33.120
strong carrier mutations a chance, but if they're not even making it through the first
link |
00:38:37.040
three months, we're not going to see them.
link |
00:38:39.860
So that's why when we say what are the most important genes to focus on, the ones that
link |
00:38:45.080
have a strong effect mutation or the ones that have a weak effect mutation, well the
link |
00:38:50.040
jury might be out because the ones that have a strong effect mutation are basically not
link |
00:38:57.080
mattering as much.
link |
00:38:58.720
The ones that only have weak effect mutations by understanding through genetics that they
link |
00:39:04.960
have a weak effect mutation and understanding that they have a causal role on the disease,
link |
00:39:10.200
we can then say, okay, great, evolution has only tolerated a 2% change in that gene.
link |
00:39:15.720
Pharmaceutically I can go in and induce a 70% change in that gene and maybe I will poke
link |
00:39:22.560
another hole at the bathtub that was not easy to control in many of the other sort of strong
link |
00:39:33.220
effect genetic variants.
link |
00:39:35.160
So there's this beautiful map of across the population of things that you're saying strong
link |
00:39:41.800
and weak effects, so stuff with a lot of mutations and stuff with little mutations with no mutations
link |
00:39:48.200
and you have this map and it lays out the puzzle.
link |
00:39:51.360
Yeah.
link |
00:39:52.360
So when I say strong effect, I mean at the level of individual mutations.
link |
00:39:56.120
So basically genes where, so you have to think of first the effect of the gene on the disease.
link |
00:40:03.640
Remember how I was sort of painting that map earlier from genetics all the way to phenotype.
link |
00:40:10.240
That gene can have a strong effect on the disease, but the genetic variant might have
link |
00:40:15.960
a weak effect on the gene.
link |
00:40:18.880
So basically when you ask what is the effect of that genetic variant on the disease, it
link |
00:40:24.960
could be that that genetic variant impacts the gene by a lot and then the gene impacts
link |
00:40:29.560
the disease by a little, or it could be that the genetic variants impacts the gene by a
link |
00:40:33.240
little and then the gene impacts the disease by a lot.
link |
00:40:35.880
So what we care about is genes that impact the disease a lot, but genetics gives us the
link |
00:40:41.720
full equation and what I would argue is if we couple the genetics with expression variation
link |
00:40:51.920
to basically ask what genes change by a lot and which genes correlate with disease by
link |
00:41:00.400
a lot, even if the genetic variants change them by a little, then those are the best
link |
00:41:06.200
places to intervene.
link |
00:41:07.200
Those are the best places where pharmaceutical, if I have even a modest effect, I will have
link |
00:41:13.200
a strong effect on the disease, whereas those genetic variants that have a huge effect on
link |
00:41:17.120
the disease, I might not be able to change that gene by this much without affecting all
link |
00:41:21.120
kinds of other things.
link |
00:41:22.360
Interesting.
link |
00:41:23.360
So that's what we're looking at.
link |
00:41:26.040
What have we been able to find in terms of which disease could be helped?
link |
00:41:31.800
Again, don't get me started.
link |
00:41:37.280
We have found so much.
link |
00:41:38.960
Our understanding of disease has changed so dramatically with genetics.
link |
00:41:46.000
I mean places that we had no idea would be involved.
link |
00:41:49.000
So one of the worst things about my genome is that I have a genetic predisposition to
link |
00:41:53.920
age related macular degeneration, AMD.
link |
00:41:56.520
So it's a form of blindness that causes you to lose the central part of your vision progressively
link |
00:42:02.260
as you grow older.
link |
00:42:04.400
My increased risk is fairly small.
link |
00:42:06.240
I have an 8% chance.
link |
00:42:07.680
You only have a 6% chance.
link |
00:42:10.080
I'm an average.
link |
00:42:11.080
By the way, when you say my, you mean literally yours.
link |
00:42:14.560
You know this about you.
link |
00:42:15.880
I know this about me.
link |
00:42:18.000
Which is kind of, I mean philosophically speaking is a pretty powerful thing to live with.
link |
00:42:26.500
Maybe that's, so we agreed to talk again by the way for the listeners to where we're going
link |
00:42:31.680
to try to focus on science today and a little bit of philosophy next time.
link |
00:42:36.080
But it's interesting to think about the more you're able to know about yourself from the
link |
00:42:42.880
genetic information in terms of the diseases, how that changes your own view of life.
link |
00:42:49.360
So there's a lot of impact there and there's something called genetics exceptionalism,
link |
00:42:56.000
which basically thinks of genetics as something very, very different than everything else
link |
00:43:01.040
as a type of determinism.
link |
00:43:04.200
And you know, let's talk about that next time.
link |
00:43:07.320
So basically.
link |
00:43:08.320
That's a good preview.
link |
00:43:09.320
Yeah.
link |
00:43:10.320
So let's go back to AMD.
link |
00:43:11.680
So basically with AMD, we have no idea what causes AMD.
link |
00:43:16.920
You know, it was, it was a mystery until the genetics were worked out.
link |
00:43:23.700
And now the fact that I know that I have a predisposition allows me to sort of make some
link |
00:43:28.640
life choices, number one, but number two, the genes that lead to that predisposition
link |
00:43:34.720
give us insights as to how does it actually work.
link |
00:43:38.520
And that's a place where genetics gave us something totally unexpected.
link |
00:43:42.960
So there's a complement pathway, which is an immune function pathway that was in, you
link |
00:43:52.000
know, most of the loci associated with AMD.
link |
00:43:55.940
And that basically told us that, wow, there's an immune basis to this eye disorder that
link |
00:44:02.600
people had just not expected before.
link |
00:44:05.180
If you look at complement, it was recently also implicated in schizophrenia.
link |
00:44:11.160
And there's a type of microglia that is involved in synaptic pruning.
link |
00:44:17.280
So synapses are the connections between neurons.
link |
00:44:20.560
And in this whole use it or lose it view of mental cognition and other capabilities, you
link |
00:44:27.160
basically have microglia, which are immune cells that are sort of constantly traversing
link |
00:44:32.960
your brain and then pruning neuronal connections, pruning synaptic connections that are not
link |
00:44:38.640
utilized.
link |
00:44:40.260
So in schizophrenia, there's thought to be a change in the pruning that basically if
link |
00:44:47.960
you don't prune your synapses the right way, you will actually have an increased role of
link |
00:44:53.280
schizophrenia.
link |
00:44:54.280
This is something that was completely unexpected for schizophrenia.
link |
00:44:57.160
Of course, we knew it has to do with neurons, but the role of the complement complex, which
link |
00:45:01.560
is also implicated in AMD, which is now also implicated in schizophrenia, was a huge surprise.
link |
00:45:06.520
What's the complement complex?
link |
00:45:08.040
So it's basically a set of genes, the complement genes that are basically having various immune
link |
00:45:13.960
roles.
link |
00:45:15.460
And as I was saying earlier, our immune system has been coopted for many different roles
link |
00:45:19.940
across the body.
link |
00:45:21.220
So they actually play many diverse roles.
link |
00:45:23.440
And somehow the immune system is connected to the synaptic pruning process, the process.
link |
00:45:29.600
Exactly.
link |
00:45:30.600
So the prune cells were coopted to prune synapses.
link |
00:45:33.080
How did you figure this out?
link |
00:45:35.720
How does one go about figuring this intricate connection, like pipeline of connections out?
link |
00:45:41.920
Yeah.
link |
00:45:42.920
Let me give you another example.
link |
00:45:44.280
So Alzheimer's disease, the first place that you would expect it to act is obviously the
link |
00:45:48.920
brain.
link |
00:45:49.920
So we had basically this roadmap epigenomics consortium view of the human epigenome, the
link |
00:45:57.000
largest map of the human epigenome that has ever been built across 127 different tissues
link |
00:46:04.440
and samples with dozens of epigenomic marks measured in hundreds of donors.
link |
00:46:10.560
So what we've basically learned through that is that you basically can map what are the
link |
00:46:16.600
active gene regulatory elements for every one of the tissues in the body.
link |
00:46:20.280
And then we connected these gene regulatory active maps of basically what regions of the
link |
00:46:27.400
human genome are turning on in every one of different tissues.
link |
00:46:32.000
We then can go back and say, where are all of the genetic loci that are associated with
link |
00:46:38.600
disease?
link |
00:46:39.600
This is something that my group, I think was the first to do back in 2010 in this Ernst
link |
00:46:46.400
Nature Biotech paper, but basically we were for the first time able to show that specific
link |
00:46:52.040
chromatin states, specific epigenomic states, in that case enhancers, were in fact enriched
link |
00:46:58.560
in disease associated variants.
link |
00:47:00.720
We pushed that further in the Ernst Nature paper a year later.
link |
00:47:05.640
And then in this roadmap epigenomics paper a few years after that, but basically that
link |
00:47:12.680
matrix that you mentioned earlier was in fact the first time that we could see what genetic
link |
00:47:18.160
traits have genetic variants that are enriched in what tissues in the body.
link |
00:47:26.360
And a lot of that map made complete sense.
link |
00:47:28.800
If you looked at a diversity of immune traits like allergies and type one diabetes and so
link |
00:47:33.800
on and so forth, you basically could see that they were enriching, that the genetic variants
link |
00:47:38.920
associated with those traits were enriched in enhancers in these gene regulatory elements
link |
00:47:44.680
active in T cells and B cells and hematopoietic stem cells and so on and so forth.
link |
00:47:49.280
So that basically gave us a confirmation in many ways that those immune traits were indeed
link |
00:47:56.960
enriching immune cells.
link |
00:48:00.360
If you looked at type two diabetes, you basically saw an enrichment in only one type of sample
link |
00:48:06.080
and it was pancreatic islets.
link |
00:48:08.960
And we know that type two diabetes sort of stems from the dysregulation of insulin in
link |
00:48:14.960
the beta cells of pancreatic islets.
link |
00:48:17.440
And that sort of was spot on, super precise.
link |
00:48:21.200
If you looked at blood pressure, where would you expect blood pressure to occur?
link |
00:48:25.880
You know, I don't know, maybe in your metabolism and ways that you process coffee or something
link |
00:48:29.880
like that.
link |
00:48:30.880
Maybe in your brain, the way that you stress out and increases your blood pressure, et
link |
00:48:33.880
cetera.
link |
00:48:34.880
So the blood pressure localized specifically in the left ventricle of the heart.
link |
00:48:40.360
So the enhancers of the left ventricle in the heart contained a lot of genetic variants
link |
00:48:44.100
associated with blood pressure.
link |
00:48:46.760
If you look at height, we found an enrichment specifically in embryonic stem cell enhancers.
link |
00:48:53.280
So the genetic variants predisposing you to be taller or shorter are in fact acting in
link |
00:48:57.480
developmental stem cells, makes complete sense.
link |
00:49:01.380
If you looked at inflammatory bowel disease, you basically found inflammatory, which is
link |
00:49:05.920
immune, and also bowel disease, which is digestive.
link |
00:49:09.880
And indeed we saw a double enrichment both in the immune cells and in the digestive cells.
link |
00:49:15.800
So that basically told us that this is acting in both components.
link |
00:49:19.040
There's an immune component to inflammatory bowel disease and there's a digestive component.
link |
00:49:23.460
And the big surprise was for Alzheimer's.
link |
00:49:25.960
We had seven different brain samples.
link |
00:49:29.160
We found zero enrichment in the brain samples for genetic variants associated with Alzheimer's.
link |
00:49:36.240
And this is mind boggling.
link |
00:49:38.020
Our brains were literally hurting.
link |
00:49:40.040
What is going on?
link |
00:49:42.000
And what is going on is that the brain samples are primarily neurons, oligodendrocytes, and
link |
00:49:49.080
astrocytes in terms of the cell types that make them up.
link |
00:49:54.120
So that basically indicated that genetic variants associated with Alzheimer's were probably
link |
00:49:59.400
not acting in oligodendrocytes, astrocytes, or neurons.
link |
00:50:04.560
So what could they be acting in?
link |
00:50:05.960
Well, the fourth major cell type is actually microglia.
link |
00:50:10.200
Microglia are resident immune cells in your brain.
link |
00:50:13.720
Oh, nice.
link |
00:50:15.880
They immune.
link |
00:50:16.880
Oh, wow.
link |
00:50:17.880
They are CD14 plus, which is this sort of cell surface markers of those cells.
link |
00:50:24.160
So they're CD14 plus cells, just like macrophages that are circulating in your blood.
link |
00:50:30.200
The microglia are resident monocytes that are basically sitting in your brain.
link |
00:50:35.640
They're tissue specific monocytes.
link |
00:50:38.400
And every one of your tissues, like your fat, for example, has a lot of macrophages that
link |
00:50:42.840
are resident.
link |
00:50:43.920
And the M1 versus M2 macrophage ratio has a huge role to play in obesity.
link |
00:50:49.560
And so basically, again, these immune cells are everywhere, but basically what we found
link |
00:50:53.440
through this completely unbiased view of what are the tissues that likely underlie different
link |
00:50:59.080
disorders, we found that Alzheimer's was humongously enriched in microglia, but not at all in the
link |
00:51:08.080
other cell types.
link |
00:51:09.080
So what are we supposed to make that if you look at the tissues involved, is that simply
link |
00:51:15.480
useful for indication of propensity for disease, or does it give us somehow a pathway of treatment?
link |
00:51:24.640
It's very much the second.
link |
00:51:26.120
If you look at the way to therapeutics, you have to start somewhere.
link |
00:51:33.900
What are you going to do?
link |
00:51:34.900
You're going to basically make assays that manipulate those genes and those pathways
link |
00:51:42.040
in those cell types.
link |
00:51:43.780
So before we know the tissue of action, we don't even know where to start.
link |
00:51:49.200
We basically are at a loss.
link |
00:51:51.040
But if you know the tissue of action, and even better, if you know the pathway of action,
link |
00:51:54.720
then you can basically screen your small molecules, not for the gene, you can screen them directly
link |
00:52:00.280
for the pathway in that cell type.
link |
00:52:02.640
So you can basically develop a high throughput multiplexed robotic system for testing the
link |
00:52:10.480
impact of your favorite molecules that you know are safe, efficacious, and sort of hit
link |
00:52:16.240
that particular gene and so on and so forth.
link |
00:52:18.940
You can basically screen those molecules against either a set of genes that act in that pathway
link |
00:52:25.720
or on the pathway directly by having a cellular assay.
link |
00:52:29.820
And then you can basically go into mice and do experiments and basically sort of figure
link |
00:52:33.400
out ways to manipulate these processes that allow you to then go back to humans and do
link |
00:52:38.800
a clinical trial that basically says, okay, I was able indeed to reverse these processes
link |
00:52:43.200
in mice.
link |
00:52:44.200
Can I do the same thing in humans?
link |
00:52:46.240
So the knowledge of the tissues gives you the pathway to treatment, but that's not the
link |
00:52:51.820
only part.
link |
00:52:52.820
There are many additional steps to figuring out the mechanism of disease.
link |
00:52:57.280
So that's really promising.
link |
00:52:59.080
Maybe to take a small step back, you've mentioned all these puzzles that were figured out with
link |
00:53:04.360
the Nature paper for, I mean, you've mentioned a ton of diseases from obesity to Alzheimer's,
link |
00:53:13.960
even schizophrenia, I think you mentioned.
link |
00:53:17.720
What is the actual methodology of figuring this out?
link |
00:53:20.720
So indeed, I mentioned a lot of diseases and my lab works on a lot of different disorders.
link |
00:53:26.040
And the reason for that is that if you look at biology, it used to be zoology departments
link |
00:53:39.500
and botanology departments and virology departments and so on and so forth.
link |
00:53:43.680
And MIT was one of the first schools to basically create a biology department, like, oh, we're
link |
00:53:47.680
going to study all of life suddenly.
link |
00:53:49.640
Why was that even a case?
link |
00:53:51.720
Because the advent of DNA and the genome and the central dogma of DNA makes RNA makes protein
link |
00:53:58.480
in many ways, unified biology.
link |
00:54:01.600
You could suddenly study the process of transcription in viruses or in bacteria and have a huge
link |
00:54:07.480
impact on yeast and fly and maybe even mammals because of this realization of these common
link |
00:54:15.040
underlying processes.
link |
00:54:17.760
And in the same way that DNA unified biology, genetics is unifying disease studies.
link |
00:54:27.180
So you used to have, I don't know, cardiovascular disease department and neurological disease
link |
00:54:39.440
department and neurodegeneration department and basically immune and cancer and so on
link |
00:54:47.640
and so forth.
link |
00:54:48.640
And all of these were studied in different labs because it made sense, because basically
link |
00:54:53.600
the first step was understanding how the tissue functions and we kind of knew the tissues
link |
00:54:57.560
involved in cardiovascular disease and so on and so forth.
link |
00:55:00.680
But what's happening with human genetics is that all of these walls and edifices that
link |
00:55:05.760
we had built are crumbling.
link |
00:55:08.480
And the reason for that is that genetics is in many ways revealing unexpected connections.
link |
00:55:16.560
So suddenly we now have to bring the immunologists to work on Alzheimer's.
link |
00:55:21.480
They were never in the room.
link |
00:55:22.680
They were in another building altogether.
link |
00:55:25.920
The same way for schizophrenia, we now have to sort of worry about all these interconnected
link |
00:55:31.600
aspects.
link |
00:55:33.200
For metabolic disorders, we're finding contributions from brain.
link |
00:55:37.500
So suddenly we have to call the neurologist from the other building and so on and so forth.
link |
00:55:41.340
So in my view, it makes no sense anymore to basically say, oh, I'm a geneticist studying
link |
00:55:49.200
immune disorders.
link |
00:55:50.200
I mean, that's ridiculous because, I mean, of course in many ways you still need to sort
link |
00:55:55.360
of focus.
link |
00:55:56.480
But what we're doing is that we're basically saying we'll go wherever the genetics takes
link |
00:56:01.080
us.
link |
00:56:02.440
And by building these massive resources, by working on our latest map is now 833 tissues,
link |
00:56:10.440
sort of the next generation of the epigenomics roadmap, which we're now called epimap, is
link |
00:56:15.560
833 different tissues.
link |
00:56:18.120
And using those, we've basically found enrichments in 540 different disorders.
link |
00:56:24.620
Those enrichments are not like, oh great, you guys work on that and we'll work on this.
link |
00:56:29.340
They're intertwined amazingly.
link |
00:56:31.980
So of course there's a lot of modularity, but there's these enhancers that are sort
link |
00:56:36.120
of broadly active and these disorders that are broadly active.
link |
00:56:39.040
So basically some enhancers are active in all tissues and some disorders are enriching
link |
00:56:43.480
in all tissues.
link |
00:56:44.480
So basically there's these multifactorial and this other class, which I like to call
link |
00:56:49.160
polyfactorial diseases, which are basically lighting up everywhere.
link |
00:56:54.560
And in many ways it's, you know, sort of cutting across these walls that were previously built
link |
00:57:00.000
across these departments.
link |
00:57:01.760
And the polyfactorial ones were probably the previous structural departments wasn't equipped
link |
00:57:07.040
to deal with those.
link |
00:57:08.040
I mean, again, maybe it's a romanticized question, but you know, there's in physics, there's
link |
00:57:14.680
a theory of everything.
link |
00:57:16.920
Do you think it's possible to move towards an almost theory of everything of disease
link |
00:57:22.400
from a genetic perspective?
link |
00:57:24.100
So if this unification continues, is it possible that, like, do you think in those terms, like
link |
00:57:29.640
trying to arrive at a fundamental understanding of how disease emerges, period?
link |
00:57:35.720
That unification is not just foreseeable, it's inevitable.
link |
00:57:41.680
I see it as inevitable.
link |
00:57:43.600
We have to go there.
link |
00:57:45.240
You cannot be a specialist anymore.
link |
00:57:48.340
If you're a genomicist, you have to be a specialist in every single disorder.
link |
00:57:53.840
And the reason for that is that the fundamental understanding of the circuitry of the human
link |
00:57:59.960
genome that you need to solve schizophrenia, that fundamental circuitry is hugely important
link |
00:58:07.960
to solve Alzheimer's.
link |
00:58:09.600
And that same circuitry is hugely important to solve metabolic disorders.
link |
00:58:13.100
And that same exact circuitry is hugely important for solving immune disorders and cancer and,
link |
00:58:20.040
you know, every single disease.
link |
00:58:22.260
So all of them have the same sub task.
link |
00:58:26.680
And I teach dynamic programming in my class.
link |
00:58:29.880
Dynamic programming is all about sort of not redoing the work.
link |
00:58:34.400
It's reusing the work that you do once.
link |
00:58:37.280
So basically for us to say, oh, great, you know, you guys in the immune building go solve
link |
00:58:42.240
the fundamental circuitry of everything.
link |
00:58:44.240
And then you guys in the schizophrenia building go solve the fundamental circuitry of everything
link |
00:58:47.680
separately, is crazy.
link |
00:58:50.080
So what we need to do is come together and sort of have a circuitry group, the circuitry
link |
00:58:56.080
building that sort of tries to solve the circuitry of everything.
link |
00:58:59.520
And then the immune folks who will apply this knowledge to all of the disorders that are
link |
00:59:05.920
associated with immune dysfunction and the schizophrenia folks will basically interacting
link |
00:59:12.460
with both the immune folks and with the neuronal folks.
link |
00:59:15.560
And all of them will be interacting with the circuitry folks and so on and so forth.
link |
00:59:19.000
So that's sort of the current structure of my group, if you wish.
link |
00:59:22.320
So basically what we're doing is focusing on the fundamental circuitry.
link |
00:59:27.200
But at the same time, we're the users of our own tools by collaborating with many other
link |
00:59:34.020
labs in every one of these disorders that we mentioned.
link |
00:59:37.480
We basically have a heart focus on cardiovascular disease, coronary artery disease, heart failure
link |
00:59:42.880
and so on and so forth.
link |
00:59:44.280
We have an immune focus on several immune disorders.
link |
00:59:48.900
We have a cancer focus on metastatic melanoma and immunotherapy response.
link |
00:59:55.580
We have a psychiatric disease focus on schizophrenia, autism, PTSD, and other psychiatric disorders.
link |
01:00:04.120
We have an Alzheimer's and neurodegeneration focus on Huntington's disease, ALS and, you
link |
01:00:10.280
know, AD related disorders like frontotemporal dementia and Lewy body dementia.
link |
01:00:14.460
And of course, a huge focus on Alzheimer's.
link |
01:00:16.740
We have a metabolic focus on the role of exercise and diets and sort of how they're impacting
link |
01:00:23.320
metabolic organs across the body and across many different tissues.
link |
01:00:29.120
And all of them are interfacing with the circuitry.
link |
01:00:34.100
And the reason for that is another computer science principle of eat your own dog food.
link |
01:00:42.180
If everybody ate their own dog food, dog food would taste a lot better.
link |
01:00:47.760
The reason why Microsoft Excel and Word and PowerPoint was so important and so successful
link |
01:00:55.080
is because the employees that were working on them, were using them for their day to
link |
01:01:00.000
day tasks.
link |
01:01:01.500
You can't just simply build a circuitry and say, here it is guys, take the circuitry,
link |
01:01:06.120
we're done without being the users of that circuitry because you then go back.
link |
01:01:11.440
And because we span the whole spectrum from profiling the epigenome, using comparative
link |
01:01:16.740
genomics, finding the important nucleotides in the genome, building the basic functional
link |
01:01:21.220
map of what are the genes in the human genome, what are the gene regulatory elements of the
link |
01:01:26.800
human genome.
link |
01:01:27.800
I mean, over the years we've written a series of papers on how do you find human genes in
link |
01:01:31.720
the first place using comparative genomics?
link |
01:01:34.080
How do you find the motifs that are the building blocks of gene regulation using comparative
link |
01:01:38.840
genomics?
link |
01:01:39.840
And how do you then find how these motifs come together and act in specific tissues
link |
01:01:44.800
using epigenomics?
link |
01:01:46.280
How do you link regulators to enhancers and enhancers to their target genes using epigenomics
link |
01:01:53.860
and regulatory genomics?
link |
01:01:55.260
So through the years we've basically built all this infrastructure for understanding
link |
01:02:00.320
what I like to say, every single nucleotide of the human genome and how it acts in every
link |
01:02:06.900
one of the major cell types and tissues of the human body.
link |
01:02:10.320
I mean, this is no small task.
link |
01:02:12.040
This is an enormous task that takes the entire field.
link |
01:02:15.540
And that's something that my group has taken on along with many other groups.
link |
01:02:20.720
And we have also, and that sort of a thing sets my group perhaps apart, we have also
link |
01:02:25.340
worked with specialists in every one of these disorders to basically further our understanding
link |
01:02:30.640
all the way down to disease and in some cases collaborating with pharma to go all the way
link |
01:02:35.280
down to therapeutics because of our deep, deep understanding of that basic circuitry
link |
01:02:42.480
and how it allows us to now improve the circuitry.
link |
01:02:47.600
Not just treat it as a black box, but basically go and say, okay, we need a better cell type
link |
01:02:51.880
specific wiring that we now have at the tissue specific level.
link |
01:02:56.480
So we're focusing on that because we're understanding the needs from the disease front.
link |
01:03:01.560
So you have a sense of the entire pipeline, I mean, one, maybe you can indulge me.
link |
01:03:08.040
One nice question to ask would be, how do you, from the scientific perspective, go from
link |
01:03:14.700
knowing nothing about the disease to going, you said, to go into the entire pipeline and
link |
01:03:22.040
actually have a drug or a treatment that cures that disease?
link |
01:03:26.840
So that's an enormously long path and an enormously great challenge.
link |
01:03:32.840
And what I'm trying to argue is that it progresses in stages of understanding rather than one
link |
01:03:39.560
gene at a time.
link |
01:03:40.960
The traditional view of biology was you have one postdoc working on this gene and another
link |
01:03:45.200
postdoc working on that gene, and they'll just figure out everything about that gene
link |
01:03:50.260
and that's their job.
link |
01:03:52.120
But we've realized how polygenic the diseases are, so we can't have one postdoc per gene
link |
01:03:57.840
anymore.
link |
01:03:58.840
We now have to have these cross cutting needs.
link |
01:04:04.360
And I'm going to describe the path to circuitry along those needs.
link |
01:04:10.480
And every single one of these paths, we are now doing in parallel across thousands of
link |
01:04:15.600
genes.
link |
01:04:17.000
So the first step is you have a genetic association, and we talked a little bit about sort of the
link |
01:04:23.160
Mendelian path and the polygenic path to that association.
link |
01:04:27.760
So the Mendelian path was looking through families to basically find gene regions and
link |
01:04:33.320
ultimately genes that are underlying particular disorders.
link |
01:04:36.860
The polygenic path is basically looking at unrelated individuals in this giant matrix
link |
01:04:43.240
of genotype by phenotype, and then finding hits where a particular variant impacts disease
link |
01:04:49.200
all the way to the end.
link |
01:04:51.520
And then we now have a connection, not between a gene and a disease, but between a genetic
link |
01:04:57.960
region and a disease.
link |
01:05:00.200
And that distinction is not understood by most people.
link |
01:05:03.520
So I'm going to explain it a little bit more.
link |
01:05:06.640
Why do we not have a connection between a gene and a disease, but we have a connection
link |
01:05:11.240
between a genetic region and a disease?
link |
01:05:13.480
The reason for that is that 93% of genetic variants that are associated with disease
link |
01:05:21.840
don't impact the protein at all.
link |
01:05:27.180
So if you look at the human genome, there's 20,000 genes, there's 3.2 billion nucleotides.
link |
01:05:33.340
Only 1.5% of the genome codes for proteins.
link |
01:05:40.120
The other 98.5% does not code for proteins.
link |
01:05:46.160
If you now look at where are the disease variants located, 93% of them fall in that outside
link |
01:05:54.440
the genes portion.
link |
01:05:55.720
Of course, genes are enriched, but they're only enriched by a factor of three.
link |
01:06:00.600
That means that still 93% of genetic variants fall outside the proteins.
link |
01:06:06.880
Why is that difficult?
link |
01:06:08.200
Why is that a problem?
link |
01:06:09.480
The problem is that when a variant falls outside the gene, you don't know what gene is impacted
link |
01:06:15.900
by that variant.
link |
01:06:16.900
You can't just say, oh, it's near this gene, let's just connect that variant to the gene.
link |
01:06:21.160
And the reason for that is that the genome circuitry is very often long range.
link |
01:06:27.880
So you basically have that genetic variant that could sit in the intron of one gene.
link |
01:06:34.880
An intron is sort of the place between the exons that code for proteins.
link |
01:06:38.120
So proteins are split up into exons and introns and every exon codes for a particular subset
link |
01:06:43.560
of amino acids and together they're spliced together and then make the final protein.
link |
01:06:49.220
So that genetic variant might be sitting in an intron of a gene.
link |
01:06:51.900
It's transcribed with the gene, it's processed and then excised, but it might not impact
link |
01:06:56.320
this gene at all.
link |
01:06:57.320
It might actually impact another gene that's a million nucleotides away.
link |
01:07:01.080
So it's just riding along even though it has nothing to do with this nearby neighborhood.
link |
01:07:05.840
That's exactly right.
link |
01:07:06.840
Let me give you an example.
link |
01:07:09.600
The strongest genetic association with obesity was discovered in this FTO gene, fat and obesity
link |
01:07:16.520
associated gene.
link |
01:07:18.400
So this FTO gene was studied ad nauseum.
link |
01:07:23.780
People did tons of experiments on it.
link |
01:07:26.740
They figured out that FTO is in fact RNA methylation transferase.
link |
01:07:33.000
It basically impacts something that we call the epitranscriptome.
link |
01:07:38.880
Just like the genome can be modified, the transcriptome, the transcript of the genes
link |
01:07:43.520
can be modified.
link |
01:07:44.900
And we basically said, oh great, that means that epitranscriptomics is hugely involved
link |
01:07:49.320
in obesity because that gene FTO is clearly where the genetic locus is at.
link |
01:07:56.880
My group studied FTO in collaboration with a wonderful team led by Melina Klausnitzer.
link |
01:08:04.400
And what we found is that this FTO locus, even though it is as associated with obesity,
link |
01:08:11.800
does not implicate the FTO gene.
link |
01:08:16.680
The genetic variance, it's in the first intron of the FTO gene, but it controls two genes
link |
01:08:22.840
IRX3 and IRX5 that are sitting 1.2 million nucleotides away, several genes away.
link |
01:08:32.120
Oh boy.
link |
01:08:33.120
What am I supposed to feel about that because isn't that like super complicated then?
link |
01:08:38.880
So the way that I was introduced at a conference a few years ago was, and here's Manolis Kellis
link |
01:08:43.640
who wrote the most depressing paper of 2015.
link |
01:08:48.720
And the reason for that is that the entire pharmaceutical industry was so comfortable
link |
01:08:52.080
that there was a single gene in that locus.
link |
01:08:56.120
Because in some loci, you basically have three dozen genes that are all sitting in the same
link |
01:08:59.580
region of association and you're like, oh gosh, which ones of those is it?
link |
01:09:04.060
But even that question of which ones of those is it is making the assumption that it is
link |
01:09:08.120
one of those as opposed to some random gene just far, far away, which is what our paper
link |
01:09:13.680
showed.
link |
01:09:14.680
So basically what our paper showed is that you can't ignore the circuitry.
link |
01:09:19.040
You have to first figure out the circuitry, all of those long range interactions, how
link |
01:09:23.460
every genetic variant impacts the expression of every gene in every tissue imaginable across
link |
01:09:28.820
hundreds of individuals.
link |
01:09:30.960
And then you now have one of the building blocks, not even all of the building blocks
link |
01:09:35.560
for then going and understanding disease.
link |
01:09:41.440
So embrace the wholeness of the circuitry.
link |
01:09:44.920
Correct.
link |
01:09:45.920
So back to the question of starting knowing nothing to the disease and going to the treatment.
link |
01:09:51.760
So what are the next steps?
link |
01:09:53.520
So you basically have to first figure out the tissue and then describe how you figure
link |
01:09:57.240
out the tissue.
link |
01:09:58.240
You figure out the tissue by taking all of these non coding variants that are sitting
link |
01:10:01.740
outside proteins and then figuring out what are the epigenomic enrichments.
link |
01:10:06.840
And the reason for that, you know, thankfully is that there is convergence, that the same
link |
01:10:13.860
processes are impacted in different ways by different loci.
link |
01:10:19.440
And that's a saving grace for our field.
link |
01:10:23.080
The fact that if I look at hundreds of genetic variants associated with Alzheimer's, they
link |
01:10:27.800
localize in a small number of processes.
link |
01:10:31.920
Can you clarify why that's hopeful?
link |
01:10:34.640
So like they show up in the same exact way in the, in the specific set of processes.
link |
01:10:40.080
Yeah.
link |
01:10:41.080
So basically there's a small number of biological processes that underlie, or at least that
link |
01:10:45.380
play the biggest role in every disorder.
link |
01:10:48.580
So in Alzheimer's you basically have, you know, maybe 10 different types of processes.
link |
01:10:54.040
One of them is lipid metabolism.
link |
01:10:56.360
One of them is immune cell function.
link |
01:10:58.920
One of them is neuronal energetics.
link |
01:11:02.400
So these are just a small number of processes, but you have multiple lesions, multiple genetic
link |
01:11:07.760
perturbations that are associated with those processes.
link |
01:11:10.980
So if you look at schizophrenia, it's excitatory neuron function, it's inhibitory neuron function,
link |
01:11:15.800
it's synaptic pruning, it's calcium signaling and so on and so forth.
link |
01:11:18.940
So when you look at disease genetics, you have one hit here and one hit there and one
link |
01:11:24.840
hit there and one hit there, completely different parts of the genome.
link |
01:11:28.200
But it turns out all of those hits are calcium signaling proteins.
link |
01:11:31.640
Oh, cool.
link |
01:11:32.640
You're like, aha.
link |
01:11:34.600
That means that calcium signaling is important.
link |
01:11:37.420
So those people who are focusing on one doctor at a time cannot possibly see that picture.
link |
01:11:42.640
You have to become a genomicist.
link |
01:11:44.560
You have to look at the omics, the om, the holistic picture to understand these enrichments.
link |
01:11:51.400
But you mentioned the convergence thing.
link |
01:11:54.080
The whatever the thing associated with the disease shows up.
link |
01:11:58.400
So let me explain convergence.
link |
01:12:00.200
Convergence is such a beautiful concept.
link |
01:12:03.580
So you basically have these four genes that are converging on calcium signaling.
link |
01:12:12.480
So that basically means that they are acting each in their own way, but together in the
link |
01:12:18.040
same process.
link |
01:12:19.820
But now in every one of these loci, you have many enhancers controlling each of those genes.
link |
01:12:27.600
That's another type of convergence where dysregulation of seven different enhancers might all converge
link |
01:12:33.280
on dysregulation of that one gene, which then converges on calcium signaling.
link |
01:12:39.280
And in each one of those enhancers, you might have multiple genetic variants distributed
link |
01:12:44.160
across many different people.
link |
01:12:46.960
Everyone has their own different mutation.
link |
01:12:49.840
But all of these mutations are impacting that enhancer.
link |
01:12:52.880
And all of these enhancers are impacting that gene.
link |
01:12:55.160
And all of these genes are impacting this pathway.
link |
01:12:57.560
And all these pathways are acting in the same tissue.
link |
01:13:00.020
And all of these tissues are converging together on the same biological process of schizophrenia.
link |
01:13:05.280
And you're saying the saving grace is that that conversion seems to happen for a lot
link |
01:13:09.960
of these diseases.
link |
01:13:11.120
For all of them.
link |
01:13:12.180
Basically that for every single disease that we've looked at, we have found an epigenomic
link |
01:13:17.200
enrichment.
link |
01:13:18.500
How do you do that?
link |
01:13:19.500
You basically have all of the genetic variants associated with the disorder.
link |
01:13:24.040
And then you're asking for all of the enhancers active in a particular tissue.
link |
01:13:28.080
For 540 disorders, we've basically found that indeed there is an enrichment.
link |
01:13:33.760
That basically means that there is commonality.
link |
01:13:37.060
And from the commonality, we can just get insights.
link |
01:13:40.600
So to explain in mathematical terms, we're basically building an empirical prior.
link |
01:13:47.120
We're using a Bayesian approach to basically say, great, all of these variants are equally
link |
01:13:52.600
likely in a particular locus to be important.
link |
01:13:57.200
So in a genetic locus, you basically have a dozen variants that are coinherited.
link |
01:14:02.800
Because the way that inheritance works in the human genome is through all of these recombination
link |
01:14:07.960
events during meiosis, you basically have, you know, you inherit maybe three, chromosome
link |
01:14:16.120
three, for example, in your body is inherited from four different parts.
link |
01:14:20.240
One part comes from your dad, another part comes from your mom, another part comes from
link |
01:14:23.840
your dad, another part comes from your mom.
link |
01:14:25.860
So basically, the way that it, sorry, from your mom's mom.
link |
01:14:30.200
So you basically have one copy that comes from your dad and one copy that comes from
link |
01:14:33.800
your mom.
link |
01:14:34.800
But that copy that you got from your mom is a mixture of her maternal and her paternal
link |
01:14:39.600
chromosome.
link |
01:14:41.000
And the copy that you got from your dad is a mixture of his maternal and his paternal
link |
01:14:44.680
chromosome.
link |
01:14:45.680
So these breakpoints that happen when chromosomes are lining up are basically ensuring through
link |
01:14:53.480
these crossover events, they're ensuring that every child cell during the process of meiosis,
link |
01:15:02.520
where you basically have, you know, one spermatozoid that basically couples with one ovule to basically
link |
01:15:08.560
create one egg to basically create the zygote.
link |
01:15:12.240
You basically have half of your genome that comes from dad and half your genome that comes
link |
01:15:16.440
from mom.
link |
01:15:17.440
But in order to line them up, you basically have these crossover events.
link |
01:15:21.040
These crossover events are basically leading to coinheritance of that entire block coming
link |
01:15:27.880
from your maternal grandmother and that entire block coming from your maternal grandfather.
link |
01:15:33.920
Over many generations, these crossover events don't happen randomly.
link |
01:15:38.800
There's a protein called PRDM9 that basically guides the double stranded breaks and then
link |
01:15:45.720
leads to these crossovers.
link |
01:15:48.320
And that protein has a particular preference to only a small number of hotspots of recombination,
link |
01:15:54.240
which then lead to a small number of breaks between these coinheritance patterns.
link |
01:15:59.880
So even though there are 6 million variants, there are 6 million loci, this variation is
link |
01:16:06.720
inherited in blocks and every one of these blocks has like two dozen genetic variants
link |
01:16:12.600
that are all associated.
link |
01:16:13.600
So in the case of FTO, it wasn't just one variant, it was 89 common variants that were
link |
01:16:19.840
all humongously associated with obesity.
link |
01:16:24.320
Which one of those is the important one?
link |
01:16:26.640
Well, if you look at only one locus, you have no idea.
link |
01:16:29.640
But if you look at many loci, you basically say, aha, all of them are enriching in the
link |
01:16:36.880
same epigenomic map.
link |
01:16:40.080
In that particular case, it was mesenchymal stem cells.
link |
01:16:44.160
So these are the progenitor cells that give rise to your brown fat and your white fat.
link |
01:16:50.560
Progenitor is like the early on developmental stem cells?
link |
01:16:54.020
So you start from one zygote and that's a totipotent cell type.
link |
01:16:58.120
It can do anything.
link |
01:17:00.000
You then, you know, that cell divides, divides, divides, and then every cell division is leading
link |
01:17:08.280
to specialization where you now have a mesodermal lineage and ectodermal lineage and endodermal
link |
01:17:14.880
lineage that basically leads to different parts of your body.
link |
01:17:19.320
The ectoderm will basically give rise to your skin, ecto means outside, derm is skin.
link |
01:17:25.840
So ectoderm, but it also gives rise to your neurons and your whole brain.
link |
01:17:29.640
So that's a lot of ectoderm.
link |
01:17:31.600
Mesoderm gives rise to your internal organs, including the vasculature and you know, your
link |
01:17:36.880
muscle and stuff like that.
link |
01:17:38.440
So you basically have this progressive differentiation and then if you look further, further down
link |
01:17:45.080
that lineage, you basically have one lineage that will give rise to both your muscle and
link |
01:17:49.700
your bone, but also your fat.
link |
01:17:52.880
And if you go further down the lineage of your fat, you basically have your white fat
link |
01:17:57.720
cells.
link |
01:17:59.040
These are the cells that store energy.
link |
01:18:01.640
So when you eat a lot, but you don't exercise too much, there's an excess set of calories,
link |
01:18:06.640
excess energy.
link |
01:18:07.640
What do you do with those?
link |
01:18:08.640
You basically create, you spend a lot of that energy to create these high energy molecules,
link |
01:18:13.520
lipids, which you can then burn when you need them on a rainy day.
link |
01:18:19.840
So that leads to obesity if you don't exercise and if you overeat because your body's like,
link |
01:18:26.320
oh great, I have all these calories.
link |
01:18:27.680
I'm going to store them.
link |
01:18:28.680
Ooh, more calories.
link |
01:18:29.680
I'm going to store them too.
link |
01:18:30.680
Ooh, more calories.
link |
01:18:31.680
So basically the 42% of European chromosomes have a predisposition to storing fat, which
link |
01:18:40.280
was selected probably in the food scarcity periods, like basically as we were exiting
link |
01:18:48.880
Africa before and during the ice ages, there was probably a selection to those individuals
link |
01:18:54.240
who made it North to basically be able to store energy, a lot more energy.
link |
01:19:00.880
So you basically now have this lineage that is deciding whether you want to store energy
link |
01:19:07.160
in your white fat or burn energy in your beige fat.
link |
01:19:11.160
It turns out that your fat is, you know, like we have such a bad view of fat.
link |
01:19:18.680
Fat is your best friend.
link |
01:19:20.160
Fat can both store all these excess lipids that would be otherwise circulating through
link |
01:19:24.500
your body and causing damage, but it can also burn calories directly.
link |
01:19:29.900
If you have too much energy, you can just choose to just burn some of that as heat.
link |
01:19:35.760
So basically when you're cold, you're burning energy to basically warm your body up and
link |
01:19:41.200
you're burning all these lipids and you're burning all these calories.
link |
01:19:44.540
So what we basically found is that across the board, genetic variants associated with
link |
01:19:50.000
obesity across many of these regions were all enriched repeatedly in mesenchymal stem
link |
01:19:56.520
cell enhancers.
link |
01:19:58.360
So that gave us a hint as to which of these genetic variants was likely driving this whole
link |
01:20:05.120
association.
link |
01:20:06.120
And we ended up with this one genetic variant called RS1421085.
link |
01:20:14.440
And that genetic variant out of the 89 was the one that we predicted to be causal for
link |
01:20:20.040
the disease.
link |
01:20:21.040
Wow.
link |
01:20:22.040
So going back to those steps, first step is figure out the relevant tissue based on the
link |
01:20:26.240
global enrichment.
link |
01:20:27.960
Second step is figure out the causal variant among many variants in this linkage disequilibrium
link |
01:20:34.840
in this coinherited block between these recombination hotspots, these boundaries of these inherited
link |
01:20:41.160
blocks.
link |
01:20:42.640
That's the second step.
link |
01:20:43.920
The third step is once you know that causal variant, try to figure out what is the motif
link |
01:20:49.920
that is disrupted by that causal variant.
link |
01:20:52.720
Basically how does it act?
link |
01:20:54.400
Variants don't just disrupt elements, they disrupt the binding of specific regulators.
link |
01:20:59.520
So basically the third step there was how do you find the motif that is responsible
link |
01:21:04.440
like the gene regulatory word, the building block of gene regulation that is responsible
link |
01:21:10.240
for that dysregulatory event.
link |
01:21:12.480
And the fourth step is finding out what regulator normally binds that motif and is now no longer
link |
01:21:18.280
able to bind.
link |
01:21:19.280
And then once you have the regulator, can you then try to figure out how to, what after
link |
01:21:24.920
it developed, how to fix it?
link |
01:21:27.200
That's exactly right.
link |
01:21:28.200
You now know how to intervene.
link |
01:21:30.260
You have basically a regulator, you have a gene that you can then perturb and you say,
link |
01:21:34.520
well, maybe that regulator has a global role in obesity.
link |
01:21:38.640
I can perturb the regulator.
link |
01:21:40.360
Just to clarify, when we say perturb, like on the scale of a human life, can a human
link |
01:21:46.760
being be helped?
link |
01:21:49.000
Of course.
link |
01:21:50.000
Yeah.
link |
01:21:51.000
I guess understanding is the first step.
link |
01:21:52.480
No, no, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics
link |
01:21:57.480
against that.
link |
01:21:59.340
Or you develop other types of intervention that affect the expression of that gene.
link |
01:22:03.800
What do pharmaceutical therapeutics look like when your understanding is on a genetic level?
link |
01:22:11.040
Yeah.
link |
01:22:12.040
Sorry if it's a dumb question.
link |
01:22:13.040
No, no, no.
link |
01:22:14.040
It's a brilliant question, but I want to save it for a little bit later when we start talking
link |
01:22:16.440
about therapeutics.
link |
01:22:17.440
Perfect.
link |
01:22:18.440
So let's talk about the first four steps.
link |
01:22:20.280
There's two more.
link |
01:22:21.600
So basically the first step is figure out, I mean, the zero step, the starting point
link |
01:22:25.600
is the genetics.
link |
01:22:26.760
The first step after that is figure out the tissue of action.
link |
01:22:31.100
The second step is figuring out the nucleotide that is responsible or set of nucleotides.
link |
01:22:36.920
The third step is figuring out the motif and the upstream regulator, number four.
link |
01:22:40.960
Number five and six is what are the targets?
link |
01:22:44.320
So number five is great.
link |
01:22:45.800
Now I know the regulator.
link |
01:22:47.200
I know the motif.
link |
01:22:48.200
I know the tissue and I know the variant.
link |
01:22:51.460
What does it actually do?
link |
01:22:53.400
So you have to now trace it to the biological process and the genes that mediate that biological
link |
01:22:59.240
process.
link |
01:23:00.480
So knowing all of this can now allow you to find the target genes.
link |
01:23:05.400
How?
link |
01:23:06.400
By basically doing perturbation experiments or by looking at the folding of the epigenome
link |
01:23:13.200
or by looking at the genetic impact of that genetic variant on the expression of genes.
link |
01:23:19.440
And we use all three.
link |
01:23:21.580
So let me go through them.
link |
01:23:22.800
Basically one of them is physical links.
link |
01:23:26.360
This is the folding of the genome onto itself.
link |
01:23:29.920
How do you even figure out the folding?
link |
01:23:32.200
It's a little bit of a tangent, but it's a super awesome technology.
link |
01:23:36.760
Think of the genome as again, this massive packaging that we talked about of taking two
link |
01:23:41.960
meters worth of DNA and putting it in something that's a million times smaller than two meters
link |
01:23:48.960
worth of DNA.
link |
01:23:49.960
That's a single cell.
link |
01:23:51.760
You basically have this massive packaging and this packaging basically leads to the
link |
01:23:56.160
chromosome being wrapped around in sort of tight, tight ways in ways, however, that are
link |
01:24:02.600
functionally capable of being reopened and reclosed.
link |
01:24:07.080
So I can then go in and figure out that folding by sort of chopping up the spaghetti soup,
link |
01:24:15.000
putting glue and ligating the segments that were chopped up but nearby each other, and
link |
01:24:21.000
then sequencing through these ligation events to figure out that this region of this chromosome,
link |
01:24:26.020
that region of the chromosome were near each other.
link |
01:24:28.360
That means they were interacting even though they were far away on the genome itself.
link |
01:24:33.560
So that chopping up, sequencing and reglueing is basically giving you folds of the genome
link |
01:24:42.500
that we call.
link |
01:24:43.500
Sorry, can you backtrack?
link |
01:24:44.500
Of course.
link |
01:24:45.500
How does cutting it help you figure out which ones were close in the original folding?
link |
01:24:50.600
So you have a bowl of noodles.
link |
01:24:53.440
Go on.
link |
01:24:54.760
And in that bowl of noodles, some noodles are near each other.
link |
01:24:59.480
Yes.
link |
01:25:00.480
So you throw in a bunch of glue, you basically freeze the noodles in place, throw in a cutter
link |
01:25:06.520
that chops up the noodles into little pieces.
link |
01:25:10.860
Now throw in some ligation enzyme that lets those pieces that were free religate near
link |
01:25:18.040
each other.
link |
01:25:19.080
In some cases, they religate what you had just cut, but that's very rare.
link |
01:25:24.240
Most of the time they will religate in whatever was proximal.
link |
01:25:30.320
You now have glued the red noodle that was crossing the blue noodle to each other.
link |
01:25:36.760
You then reverse the glue, the glue goes away and you just sequence the heck out of it.
link |
01:25:43.020
Most of the time you'll find red segment with, you know, red segment, but you can specifically
link |
01:25:48.640
select for ligation events that have happened that were not from the same segment by sort
link |
01:25:52.640
of marking them in a particular way and then selecting those and then you sequence and
link |
01:25:57.360
you look for red with blue matches of sort of things that were glued that were not immediate
link |
01:26:03.400
proximal to each other.
link |
01:26:05.520
And that reveals the linking of the blue noodle and the red noodle.
link |
01:26:08.640
You're with me so far?
link |
01:26:09.640
Yeah.
link |
01:26:10.640
Good.
link |
01:26:11.640
So we've done these experiments.
link |
01:26:12.640
That's the physical.
link |
01:26:13.640
That's the physical.
link |
01:26:14.640
That's step one of the physical.
link |
01:26:15.820
And what the physical revealed is topologically associated domains, basically big blocks of
link |
01:26:20.000
the genome that are topologically connected together.
link |
01:26:25.040
That's the physical.
link |
01:26:26.300
The second one is the genetic links.
link |
01:26:30.060
It basically says across individuals that have different genetic variants, how are their
link |
01:26:37.220
genes expressed differently?
link |
01:26:39.400
Remember before I was saying that the path between genetics and disease is enormous,
link |
01:26:43.080
but we can break it up to look at the path between genetics and gene expression.
link |
01:26:47.520
So instead of using Alzheimer's as a phenotype, I can now use expression of IRX3 as the phenotype,
link |
01:26:54.480
expression of gene A. And I can look at all of the humans who contain a G at that location
link |
01:27:01.160
and all the humans that contain a T at that location and basically say, wow, it turns
link |
01:27:05.360
out that the expression of each gene is higher for the T humans than for the G humans at
link |
01:27:09.480
that location.
link |
01:27:10.660
So that basically gives me a genetic link between a genetic variant, a locus, a region,
link |
01:27:16.560
and the expression of nearby genes.
link |
01:27:19.960
Good on the genetic link?
link |
01:27:20.960
I think so.
link |
01:27:21.960
Awesome.
link |
01:27:22.960
The third genetic link is the activity link.
link |
01:27:25.480
What's an activity link?
link |
01:27:26.480
It basically says if I look across 833 different epigenomes, whenever this enhancer is active,
link |
01:27:34.320
this gene is active.
link |
01:27:36.040
That gives me an activity link between this region of the DNA and that gene.
link |
01:27:42.340
And then the fourth one is perturbations where I can go in and blow up that region and see
link |
01:27:47.140
what are the genes that change in expression, or I can go in and over activate that region
link |
01:27:51.900
and see what genes change in expression.
link |
01:27:55.120
So I guess that's similar to activity?
link |
01:27:57.240
Yeah.
link |
01:27:58.240
Yeah.
link |
01:27:59.240
So that's basically similar to activity.
link |
01:28:00.240
I agree, but it's causal rather than correlational.
link |
01:28:02.760
Again, I'm a little weird.
link |
01:28:04.960
No, no, you're 100% on.
link |
01:28:07.160
It's exactly the same as the perturbation where I go in and intervene.
link |
01:28:11.440
I basically take a bunch of cells.
link |
01:28:13.800
So you know CRISPR, right?
link |
01:28:16.160
CRISPR is this genome guidance and cutting mechanism.
link |
01:28:21.500
That's what George Church likes to call genome vandalism.
link |
01:28:24.680
So you basically are able to, you can basically take a guide RNA that you put into the CRISPR
link |
01:28:32.720
system, and the CRISPR system will basically use this guide RNA, scan the genome, find
link |
01:28:38.200
wherever there's a match, and then cut the genome.
link |
01:28:42.560
So I digress, but it's a bacterial immune defense system.
link |
01:28:48.000
So basically bacteria are constantly attacked by viruses, but sometimes they win against
link |
01:28:54.280
the viruses and they chop up these viruses.
link |
01:28:56.960
And remember as a trophy inside their genome, they have these loci, these CRISPR loci that
link |
01:29:02.800
basically stands for clustered repeats, interspersed, et cetera.
link |
01:29:06.400
So basically it's an interspersed repeats structure where basically you have a set of
link |
01:29:11.900
repetitive regions and then interspersed where these variable segments that were basically
link |
01:29:17.400
matching viruses.
link |
01:29:19.600
So when this was first discovered, it was basically hypothesized that this is probably
link |
01:29:24.240
a bacterial immune system that remembers the trophies of the viruses that managed to kill.
link |
01:29:30.360
And then the bacteria pass on, you know, they sort of do lateral transfer of DNA and they
link |
01:29:34.720
pass on these memories so that the next bacterium says, Ooh, you killed that guy.
link |
01:29:39.120
When that guy shows up again, I will recognize him.
link |
01:29:41.700
And the CRISPR system was basically evolved as a bacterial adaptive immune response to
link |
01:29:47.320
sense foreigners that should not belong and to just go and cut their genome.
link |
01:29:52.560
So it's an RNA guided RNA cutting enzyme or an RNA guided DNA cutting enzyme.
link |
01:30:00.280
So there's different systems.
link |
01:30:02.240
Some of them cut DNA, some of them cut RNA, but all of them remember this sort of viral
link |
01:30:08.620
attack.
link |
01:30:10.660
So what we have done now as a field is, you know, through the work of, you know, Jennifer
link |
01:30:15.920
Donne, Manuel Carpentier, Feng Zhang and many others is coopted that system of bacterial
link |
01:30:23.240
immune defense as a way to cut genomes.
link |
01:30:26.520
You basically have this guiding system that allows you to use an RNA guide to bring enzymes
link |
01:30:35.280
to cut DNA at a particular locus.
link |
01:30:37.760
That's so fascinating.
link |
01:30:39.240
So this is like already a natural mechanism, a natural tool for cutting those useful as
link |
01:30:45.600
particular context.
link |
01:30:46.600
And we're like, well, we can use that thing to actually, it's a nice tool that's already
link |
01:30:51.160
in the body.
link |
01:30:52.160
Yeah.
link |
01:30:53.160
Yeah.
link |
01:30:54.160
It's not in our body.
link |
01:30:55.160
It's in the bacterial body.
link |
01:30:56.160
It was discovered by the yogurt industry.
link |
01:30:59.320
They were trying to make better yogurts and they were trying to make their bacteria in
link |
01:31:03.640
their yogurt cultures more resilient to viruses.
link |
01:31:08.400
And they were studying bacteria and they found that, wow, this CRISPR system is awesome.
link |
01:31:12.480
It allows you to defend against that.
link |
01:31:14.820
And then it was coopted in mammalian systems that don't use anything like that as a targeting
link |
01:31:20.600
way to basically bring these DNA cutting enzymes to any locus in the genome.
link |
01:31:25.800
Why would you want to cut DNA to do anything?
link |
01:31:29.620
The reason is that our DNA has a DNA repair mechanism where if a region of the genome
link |
01:31:35.040
gets randomly cut, you will basically scan the genome for anything that matches and sort
link |
01:31:40.520
of use it by homology.
link |
01:31:43.480
So the reason why we're deployed is because we now have a spare copy.
link |
01:31:47.240
As soon as my mom's copy is deactivated, I can use my dad's copy.
link |
01:31:50.640
And somewhere else, if my dad's copy is deactivated, I can use my mom's copy to repair it.
link |
01:31:55.240
So this is called homologous based repair.
link |
01:31:59.720
So all you have to do is the cutting and you don't have to do the fixing.
link |
01:32:04.080
That's exactly right.
link |
01:32:05.080
You don't have to do the fixing.
link |
01:32:06.080
Because it's already built in.
link |
01:32:07.320
That's exactly right.
link |
01:32:08.560
But the fixing can be coopted by throwing in a bunch of homologous segments that instead
link |
01:32:14.720
of having your dad's version, have whatever other version you'd like to use.
link |
01:32:19.960
So you then control the fixing by throwing in a bunch of other stuff.
link |
01:32:24.040
That's exactly right.
link |
01:32:25.040
And that's how you do genome editing.
link |
01:32:26.440
So that's what CRISPR is.
link |
01:32:27.880
That's what CRISPR is.
link |
01:32:28.880
In popular culture, people use the term.
link |
01:32:30.840
I've never, wow, that's brilliant.
link |
01:32:32.640
So CRISPR is genome vandalism followed by a bunch of band aids that have the sequence
link |
01:32:39.080
that you'd like.
link |
01:32:40.080
And you could control the choices of band aids.
link |
01:32:43.000
Correct.
link |
01:32:44.000
And of course there's new generations of CRISPR.
link |
01:32:46.360
There's something that's called prime editing that was sort of very, very much in the press
link |
01:32:50.880
recently that basically instead of sort of making a double stranded break, which again
link |
01:32:55.360
is genome vandalism, you basically make a single stranded break.
link |
01:33:00.820
You basically just nick one of the two strands, enabling you to sort of peel off without sort
link |
01:33:06.640
of completely breaking it up and then repair it locally using a guide that is coupled to
link |
01:33:13.280
your initial RNA that took you to that location.
link |
01:33:18.600
Dumb question, but is CRISPR as awesome and cool as it sounds?
link |
01:33:24.000
I mean, technically speaking, in terms of like as a tool for manipulating our genetics
link |
01:33:31.820
in the positive meaning of the word manipulating, or is there downsides, drawbacks in this whole
link |
01:33:39.040
context of therapeutics that we're talking about or understanding and so on?
link |
01:33:42.920
So when I teach my students about CRISPR, I show them articles with the headline, genome
link |
01:33:50.040
editing tool revolutionizes biology.
link |
01:33:53.120
And then I show them the date of these articles and they're 2004, like five years before CRISPR
link |
01:33:58.360
was invented.
link |
01:33:59.760
And the reason is that they're not talking about CRISPR.
link |
01:34:02.360
They're talking about zinc finger enzymes that are another way to bring these cutters
link |
01:34:07.520
to the genome.
link |
01:34:09.040
It's a very difficult way of sort of designing the right set of zinc finger proteins, the
link |
01:34:13.880
right set of amino acids that will now target a particular long stretch of DNA because for
link |
01:34:20.280
every location that you want to target, you need to design a particular regulator, a particular
link |
01:34:25.760
protein that will match that region well.
link |
01:34:28.800
There's another technology called talons, which are basically just a different way of
link |
01:34:35.240
using proteins to sort of guide these cutters to a particular location of the genome.
link |
01:34:41.440
These require a massive team of engineers, of biological engineers to basically design
link |
01:34:46.520
a set of amino acids that will target a particular sequence of your genome.
link |
01:34:51.480
The reason why CRISPR is amazingly, awesomely revolutionary is because instead of having
link |
01:34:57.080
this team of engineers design a new set of proteins for every locus that you want to
link |
01:35:02.200
target, you just type it in your computer and you just synthesize an RNA guide.
link |
01:35:07.680
The beauty of CRISPR is not the cutting, it's not the fixing.
link |
01:35:11.100
All of that was there before.
link |
01:35:12.880
It's the guiding, and the only thing that changes is that it makes the guiding easier
link |
01:35:17.880
by sort of just typing in the RNA sequence, which then allows the system to sort of scan
link |
01:35:23.940
the DNA to find that.
link |
01:35:25.880
So the coding, the engineering of the cutter is easier in terms of SP.
link |
01:35:32.280
That's kind of similar to the story of deep learning versus old school machine learning.
link |
01:35:37.200
Some of the challenging parts are automated.
link |
01:35:41.080
But CRISPR is just one cutting technology, and then that's part of the challenges and
link |
01:35:47.180
exciting opportunities of the field is to design different cutting technologies.
link |
01:35:53.020
So now this was a big parenthesis on CRISPR, but now when we were talking about perturbations,
link |
01:36:00.840
you basically now have the ability to not just look at correlation between enhancers
link |
01:36:04.720
and genes, but actually go and either destroy that enhancer and see if the gene changes
link |
01:36:10.760
in expression, or you can use the CRISPR targeting system to bring in not vandalism and cutting,
link |
01:36:20.000
but you can couple the CRISPR system with, and the CRISPR system is called usually CRISPR
link |
01:36:26.720
Cas9 because Cas9 is the protein that will then come and cut.
link |
01:36:30.920
But there's a version of that protein called dead Cas9 where the cutting part is deactivated.
link |
01:36:36.760
So you basically use the dead Cas9 to bring in an activator or to bring in a repressor.
link |
01:36:45.040
So you can now ask, is this enhancer changing that gene by taking this modified CRISPR,
link |
01:36:51.920
which is already modified from the bacteria to be used in humans, that you can now modify
link |
01:36:55.560
the Cas9 to be dead Cas9, and you can now further modify to bring in a regulator, and
link |
01:37:01.120
you can basically turn on or turn off that enhancer and then see what is the impact on
link |
01:37:05.000
that gene.
link |
01:37:06.620
So these are the four ways of linking the locus to the target gene, and that's step
link |
01:37:11.840
number five.
link |
01:37:14.240
Step number five is find the target gene, and step number six is what the heck does
link |
01:37:17.960
that gene do?
link |
01:37:19.560
You basically now go and manipulate that gene to basically see what are the processes that
link |
01:37:25.840
change, and you can basically ask, well, in this particular case, in the FTO locus, we
link |
01:37:32.400
found mesenchymal stem cells that are the progenitors of white fat and brown fat or
link |
01:37:38.160
beige fat.
link |
01:37:39.580
We found the RS1421085 nucleotide variant as the causal variant.
link |
01:37:44.880
We found this large enhancer, this master regulator.
link |
01:37:49.720
I like to call it OB1 for obesity one, like the strongest enhancer associated with it,
link |
01:37:55.720
and OB1 was kind of chubby as the actor.
link |
01:37:57.120
I don't know if you remember him.
link |
01:38:01.120
So you basically are using this Jedi mind trick to basically find out the location of
link |
01:38:07.320
the genome that is responsible, the enhancer that harbors it, the motif, the upstream regulator,
link |
01:38:14.120
which is ARID5B for AT rich interacting domain 5B.
link |
01:38:18.200
That's a protein that sort of comes and binds normally.
link |
01:38:21.040
That protein is normally a repressor.
link |
01:38:23.220
It represses this super enhancer, this massive 12,000 nucleotide master regulatory control
link |
01:38:28.520
gene, and it turns off IRX3, which is a gene that's 600,000 nucleotides away, and IRX5,
link |
01:38:36.120
which is 1.2 million nucleotides away.
link |
01:38:38.480
So those things.
link |
01:38:39.480
And what's the effect of turning them off?
link |
01:38:40.760
That's exactly the next question.
link |
01:38:42.320
So step six is what do these genes actually do?
link |
01:38:45.520
So we then ask, what does RX3 and RX5 do?
link |
01:38:48.640
The first thing we did is look across individuals for individuals that had higher expression
link |
01:38:52.940
of RX3 or lower expression RX3.
link |
01:38:55.520
And then we looked at the expression of all of the other genes in the genome.
link |
01:38:58.960
And we looked for simply correlation.
link |
01:39:01.580
And we found that RX3 and RX5 were both correlated positively with lipid metabolism and negatively
link |
01:39:09.820
with mitochondrial biogenesis.
link |
01:39:11.800
You're like, what the heck does that mean?
link |
01:39:16.400
Does this sound related to obesity?
link |
01:39:18.120
Not at all superficially, but lipid metabolism should, because lipids is these high and
link |
01:39:25.500
energy molecules that basically store fat.
link |
01:39:28.560
So RX3 and RX5 are negatively correlated with lipid metabolism.
link |
01:39:33.760
So that basically means that when they turn on, positively, when they turn on, they turn
link |
01:39:39.000
on lipid metabolism.
link |
01:39:41.280
And they're negatively correlated with mitochondrial biogenesis.
link |
01:39:45.920
What do mitochondria do in this whole process?
link |
01:39:49.160
Again, small parenthesis, what are mitochondria?
link |
01:39:53.280
Mitochondria are little organelles.
link |
01:39:56.360
They arose, they only are found in eukaryotes.
link |
01:40:01.120
U means good, karyote means nucleus.
link |
01:40:04.000
So truly like a true nucleus.
link |
01:40:05.920
So eukaryotes have a nucleus.
link |
01:40:07.880
Prokaryotes are before the nucleus.
link |
01:40:09.960
They don't have a nucleus.
link |
01:40:11.280
So eukaryotes have a nucleus, compartmentalization.
link |
01:40:16.840
Eukaryotes have also organelles.
link |
01:40:19.680
Some eukaryotes have chloroplasts.
link |
01:40:22.800
These are the plants, they photosynthesize.
link |
01:40:26.480
Some other eukaryotes like us have another type of organelle called mitochondria.
link |
01:40:33.480
These arose from an ancient species that we engulfed.
link |
01:40:40.360
This is an endosymbiosis event.
link |
01:40:44.360
Symbiosis bio means life, sim means together.
link |
01:40:47.320
So symbiotes are things that live together.
link |
01:40:50.800
Symbiosis endo means inside, so endosymbiosis means you live together holding the other
link |
01:40:54.240
one inside you.
link |
01:40:56.120
So the pre eukaryotes engulfed an organism that was very good at energy production and
link |
01:41:07.240
that organism eventually shed most of its genome to now have only 13 genes in the mitochondrial
link |
01:41:14.200
genome and those 13 genes are all involved in energy production, the electron transport
link |
01:41:22.400
chain.
link |
01:41:23.400
So basically electrons are these massive super energy rich molecules.
link |
01:41:28.560
We basically have these organelles that produce energy and when your muscle exercises, you
link |
01:41:35.760
basically multiply your mitochondria.
link |
01:41:37.800
You basically sort of, you know, use more and more mitochondria and that's how you get
link |
01:41:42.960
beefed up.
link |
01:41:43.960
So basically the muscle sort of learns how to generate more energy.
link |
01:41:47.840
So basically every single time your muscles will, you know, overnight regenerate and sort
link |
01:41:51.680
of become stronger and amplify their mitochondria and so forth.
link |
01:41:55.240
So what does mitochondria do?
link |
01:41:56.480
The mitochondria use energy to sort of do any kind of task.
link |
01:42:02.200
When you're thinking, you're using energy.
link |
01:42:05.000
This energy comes from mitochondria.
link |
01:42:06.960
Your neurons have mitochondria all over the place.
link |
01:42:10.040
Basically this mitochondria can multiply as organelles and they can be spread along the
link |
01:42:13.340
body of your muscle.
link |
01:42:15.040
Some of your muscle cells have actually multiple nuclei, they're polynucleated, but they also
link |
01:42:18.840
have multiple mitochondria to basically deal with the fact that your muscle is enormous.
link |
01:42:24.380
You can sort of span these super, super long length and you need energy throughout the
link |
01:42:28.040
length of your muscle.
link |
01:42:29.360
So that's why you have mitochondria throughout the length and you also need transcription
link |
01:42:32.340
through the length so you have multiple nuclei as well.
link |
01:42:35.080
So these two processes, lipids store energy, what do mitochondria do?
link |
01:42:42.060
So there's a process known as thermogenesis.
link |
01:42:46.040
Thermal heat, genesis generation.
link |
01:42:48.520
Thermogenesis is the generation of heat.
link |
01:42:50.600
Remember that bathtub with the in and out?
link |
01:42:55.160
That's the equation that everybody's focused on.
link |
01:42:57.160
So how much energy do you consume?
link |
01:42:58.860
How much energy do you burn?
link |
01:43:01.000
But in every thermodynamic system, there's three parts to the equation.
link |
01:43:06.060
There's energy in, energy out, and energy lost.
link |
01:43:10.900
Any machine has loss of energy.
link |
01:43:14.680
How do you lose energy?
link |
01:43:15.720
You emanate heat.
link |
01:43:17.600
So heat is energy loss.
link |
01:43:20.000
So there's...
link |
01:43:24.760
Which is where the thermogenesis comes in.
link |
01:43:26.600
Thermogenesis is actually a regulatory process that modulates the third component of the
link |
01:43:32.240
thermodynamic equation.
link |
01:43:34.060
You can basically control thermogenesis explicitly.
link |
01:43:37.240
You can turn on and turn off thermogenesis.
link |
01:43:39.080
And that's where the mitochondria comes into play.
link |
01:43:41.400
Exactly.
link |
01:43:42.400
So Irix3 and RX5 turn out to be the master regulators of a process of thermogenesis versus
link |
01:43:49.600
lipogenesis generation of fat.
link |
01:43:52.360
So Irix3 and RX5 in most people burn heat, burn calories as heat.
link |
01:43:58.720
So when you eat too much, just burn it off in your fat cells.
link |
01:44:02.720
So that bathtub has basically a sort of dissipation knob that most people are able to turn on.
link |
01:44:11.140
I am unable to turn that on because I am a homozygous carrier for the mutation that changes
link |
01:44:17.720
a T into a C in the RS1421085 allele and locus, a SNP.
link |
01:44:24.560
I have the risk allele twice from my mom and from my dad.
link |
01:44:28.320
So I'm unable to thermogenize.
link |
01:44:31.880
I'm unable to turn on thermogenesis through Irix3 and RX5 because the regulator that normally
link |
01:44:37.320
binds here, Irix5b, can no longer bind because it's an AT rich interacting domain.
link |
01:44:42.720
And as soon as I change the T into a C, it can no longer bind because it's no longer
link |
01:44:46.440
AT rich.
link |
01:44:47.440
But doesn't that mean that you're able to use the energy more efficiently?
link |
01:44:52.280
You're not generating heat or is that?
link |
01:44:54.120
That means I can eat less and get around just fine.
link |
01:44:56.920
Yes.
link |
01:44:57.920
Yeah.
link |
01:44:58.920
So that's a feature actually.
link |
01:44:59.920
It's a feature in a food scarce environment.
link |
01:45:02.040
Yeah.
link |
01:45:03.040
But if we're all starving, I'm doing great.
link |
01:45:05.160
If we all have access to massive amounts of food, I'm obese basically.
link |
01:45:09.360
That's taken us to the entire process of then understanding that why mitochondria and then
link |
01:45:14.920
the lipids are both, even though distant, are somehow involved.
link |
01:45:18.600
Different sides of the same coin.
link |
01:45:20.760
And you basically choose to store energy or you can choose to burn energy.
link |
01:45:24.000
And then all of that is involved in the puzzle of obesity.
link |
01:45:27.800
And that's what's fascinating, right?
link |
01:45:29.760
Here we are in 2007, discovering the strongest genetic association with obesity and knowing
link |
01:45:35.360
nothing about how it works for almost 10 years.
link |
01:45:39.460
For 10 years, everybody focused on this FTO gene and they were like, oh, it must have
link |
01:45:43.840
to do something with RNA modification.
link |
01:45:46.240
And it's like, no, it has nothing to do with the function of FTO.
link |
01:45:50.760
It has everything to do with all of these other processes.
link |
01:45:53.880
And suddenly the moment you solve that puzzle, which is a multiyear effort by the way, a
link |
01:45:58.680
tremendous effort by Melina and many, many others.
link |
01:46:01.880
So this tremendous effort basically led us to recognize this circuitry.
link |
01:46:07.160
You went from having some 89 common variants associated in that region of the DNA sitting
link |
01:46:12.500
on top of this gene to knowing the whole circuitry.
link |
01:46:17.840
When you know the circuitry, you can now go crazy.
link |
01:46:21.160
You can now start intervening at every level.
link |
01:46:24.480
You can start intervening at the arid 5B level.
link |
01:46:27.240
You can start intervening with CRISPR Cas9 at the single SNP level.
link |
01:46:31.280
You can start intervening at iRx3 and iRx5 directly there.
link |
01:46:34.860
You can start intervening at the thermogenesis level because you know the pathway.
link |
01:46:38.400
You can start intervening at the differentiation level where the decision to make either white
link |
01:46:45.280
fat or beige fat, the energy burning beige fat is made developmentally in the first three
link |
01:46:51.500
days of differentiation of your adipocytes.
link |
01:46:54.040
So as they're differentiating, you basically can choose to make fat burning machines or
link |
01:46:57.720
fat storing machines.
link |
01:46:59.320
And sort of that's how you populate your fat.
link |
01:47:02.320
You basically can now go in pharmaceutical and do all of that.
link |
01:47:05.880
And in our paper, we actually did all of that.
link |
01:47:09.400
We went in and manipulated every single aspect.
link |
01:47:12.320
At the nucleotide level, we use CRISPR Cas9 genome editing to basically take primary adipocytes
link |
01:47:18.200
from risk and non risk individuals and show that by editing that one nucleotide out of
link |
01:47:24.080
3.2 billion nucleotides in the human genome, you could then flip between an obese phenotype
link |
01:47:29.600
and a lean phenotype like a switch.
link |
01:47:31.500
You can basically take my cells that are non thermogenizing and just flip into thermogenizing
link |
01:47:36.240
cells by changing one nucleotide.
link |
01:47:38.640
It's mind boggling.
link |
01:47:40.080
It's so inspiring that this puzzle could be solved in this way and it feels within reach
link |
01:47:44.880
to then be able to crack the problem of some of these diseases.
link |
01:47:50.560
What are the technologies, the tools that came along that made this possible?
link |
01:48:00.480
What are you excited about?
link |
01:48:01.980
Maybe if we just look at the buffet of things that you've kind of mentioned, what's involved?
link |
01:48:08.080
What should we be excited about?
link |
01:48:09.520
What are you excited about?
link |
01:48:11.460
I love that question because there's so much ahead of us.
link |
01:48:14.040
There's so, so much.
link |
01:48:18.600
So basically solving that one locus required massive amounts of knowledge that we have
link |
01:48:24.000
been building across the years through the epigenome, through the comparative genomics
link |
01:48:28.220
to find out the causal variant and the controller regulatory motif through the conserved circuitry.
link |
01:48:35.400
It required knowing these regulatory genomic wiring.
link |
01:48:38.580
It required high C of these sort of topologically associated domains to basically find these
link |
01:48:42.980
long range interaction.
link |
01:48:44.600
It required EQTLs of these sort of genetic perturbation of these intermediate gene phenotypes.
link |
01:48:51.160
It required all of the arsenal of tools that I've been describing was put together for
link |
01:48:55.640
one locus.
link |
01:48:57.240
And this was a massive team effort, huge investment in time, energy, money, effort, intellectual,
link |
01:49:05.840
everything.
link |
01:49:06.840
You're referring to, I'm sorry, just for the obesity one.
link |
01:49:09.640
Yeah, this one paper.
link |
01:49:10.640
This one single paper.
link |
01:49:11.640
This one single locus.
link |
01:49:12.640
I would like to say that this is a paper about one nucleotide in the human genome, about
link |
01:49:16.640
one bit of information, C versus T in the human genome.
link |
01:49:20.560
That's one bit of information and we have 3.2 billion nucleotides to go through.
link |
01:49:25.320
So how do you do that systematically?
link |
01:49:29.240
I am so excited about the next phase of research because the technologies that my group and
link |
01:49:35.000
many other groups have developed allows us to now do this systematically, not just one
link |
01:49:40.080
locus at a time, but thousands of loci at a time.
link |
01:49:45.120
So let me describe some of these technologies.
link |
01:49:48.000
The first one is automation and robotics.
link |
01:49:52.420
So basically, you know, we talked about how you can take all of these molecules and see
link |
01:49:58.240
which of these molecules are targeting each of these genes and what do they do?
link |
01:50:02.200
So you can basically now screen through millions of molecules through thousands and thousands
link |
01:50:07.700
and thousands of plates, each of which has thousands and thousands and thousands of molecules,
link |
01:50:12.880
every single time testing, you know, all of these genes and asking which of these molecules
link |
01:50:20.560
perturb these genes.
link |
01:50:22.000
So that's technology number one, automation and robotics.
link |
01:50:25.880
Technology number two is parallel readouts.
link |
01:50:29.280
So instead of perturbing one locus and then asking if I use CRISPR Cas9 on this enhancer
link |
01:50:35.880
to basically use dCas9 to turn on or turn off the enhancer, or if I use CRISPR Cas9
link |
01:50:41.280
on the SNP to basically change that one SNP at a time, then what happens?
link |
01:50:46.620
But we have 120,000 disease associated SNPs that we want to test.
link |
01:50:52.760
We don't want to spend 120,000 years doing it.
link |
01:50:57.220
So what do we do?
link |
01:50:58.920
We've basically developed this technology for massively parallel reporter assays, MPRA.
link |
01:51:07.240
So in collaboration with Tarsha Mikkelsen, Eric Lander, I mean, Jason Durie's group has
link |
01:51:11.240
done a lot of that.
link |
01:51:12.240
So there's a lot of groups that basically have developed technologies for testing 10,000
link |
01:51:19.380
genetic variants at a time.
link |
01:51:21.420
How do you do that?
link |
01:51:23.000
You know, we talked about microarray technology, the ability to synthesize these huge microarrays
link |
01:51:28.880
that allow you to do all kinds of things like measure gene expression by hybridization,
link |
01:51:33.880
by measuring the genotype of a person, by looking at hybridization with one version
link |
01:51:38.100
with a T versus the other version with a C, and then sort of figuring out that I am a
link |
01:51:43.400
risk carrier for obesity based on these differential hybridization in my genome that says, oh,
link |
01:51:49.820
you seem to only have this allele or you seem to have that allele.
link |
01:51:53.320
These can also be used to systematically synthesize small fragments of DNA.
link |
01:51:59.400
So you can basically synthesize these 150 nucleotide long fragments across 450,000 spots
link |
01:52:07.800
at a time.
link |
01:52:10.240
You can now take the result of that synthesis, which basically works through all of these
link |
01:52:15.820
sort of layers of adding one nucleotide at a time.
link |
01:52:18.760
You can basically just type it into your computer and order it, and you can basically order
link |
01:52:24.000
10,000 or 100,000 of these small DNA segments at a time.
link |
01:52:30.740
And that's where awesome molecular biology comes in.
link |
01:52:33.360
You can basically take all these segments, have a common start and end barcode or sort
link |
01:52:38.840
of like Gator, just like pieces of a puzzle.
link |
01:52:42.120
You can make the same end piece and the same start piece for all of them.
link |
01:52:48.000
And you can now use plasmids, which are these extra chromosomal small DNA circular segments
link |
01:52:57.960
that are basically inhabiting all our, all our genomes.
link |
01:53:00.560
We basically have, you know, plasmids from floating around and bacteria use plasmids
link |
01:53:05.200
for transferring DNA.
link |
01:53:07.060
And that's where they put a lot of antibiotic resistance genes.
link |
01:53:10.720
So they can easily transfer them from one bacterium to the other.
link |
01:53:14.200
After one bacterium evolves a gene to be resistant to a particular antibiotic, it basically says
link |
01:53:20.280
to all its friends, Hey, here's that sort of DNA piece.
link |
01:53:24.760
We can now coopt these plasmids into human cells.
link |
01:53:28.440
You can basically make a human cell culture and add plasmids to that human cell culture
link |
01:53:34.000
that contain the things that you want to test.
link |
01:53:38.120
You now have this library of 450,000 elements.
link |
01:53:41.320
You can insert them each into the common plasmid and then test them in millions of cells in
link |
01:53:47.880
parallel.
link |
01:53:48.880
And the common plasmid is all the same before you add it.
link |
01:53:51.160
Exactly.
link |
01:53:52.160
The rest of the plasmid is the same.
link |
01:53:53.300
So it's, it's called an epizomal reporter assay.
link |
01:53:57.640
Epizome means not inside the genome.
link |
01:53:59.720
It's sort of outside the chromosomes.
link |
01:54:01.560
So it's an epizomal assay that allows you to have a variable region where you basically
link |
01:54:06.200
test 10,000 different enhancers and you have a common region which basically has the same
link |
01:54:11.720
reporter gene.
link |
01:54:13.720
You now can do some very cool molecular biology.
link |
01:54:16.600
You can basically take the 450,000 elements that you've generated and you have a piece
link |
01:54:21.960
of the puzzle here, piece of the puzzle here, which is identical.
link |
01:54:24.440
So they're compatible with that plasmid.
link |
01:54:27.060
You can chop them up in the middle to separate a barcode reporter from the enhancer and in
link |
01:54:32.840
the middle put the same gene again using the same piece of the puzzle.
link |
01:54:36.920
You now can have a barcode readout of what is the impact of 10,000 different versions
link |
01:54:42.960
of an enhancer on gene expression.
link |
01:54:46.600
So we're not doing one experiment, we're doing 10,000 experiments.
link |
01:54:50.680
And those 10,000 can be 5,000 of different loci and each of them in two versions, risk
link |
01:54:58.580
or non risk.
link |
01:55:00.260
I can now test tens of thousands.
link |
01:55:01.920
Just a little hypothesis.
link |
01:55:02.920
Exactly.
link |
01:55:03.920
And then you can do 10,000 and we can test 10,000 hypothesis at once.
link |
01:55:08.880
How hard is it to generate those 10,000?
link |
01:55:11.360
Trivial.
link |
01:55:12.360
Trivial.
link |
01:55:13.360
But it's biology.
link |
01:55:14.360
No, no.
link |
01:55:15.360
Generating the 10,000 is trivial because you basically add, it's biotechnology.
link |
01:55:20.740
You basically have these arrays that add one nucleotide at a time at every spot.
link |
01:55:26.560
So it's printing and so you're able to, you're able to control.
link |
01:55:30.680
Yeah.
link |
01:55:31.680
Is it super costly?
link |
01:55:32.800
Is it?
link |
01:55:33.800
10,000 bucks.
link |
01:55:34.800
So this isn't millions.
link |
01:55:35.800
10,000 bucks for 10,000 experiments sounds like the right, you know.
link |
01:55:39.200
I mean, so that's super, that's exciting because you don't have to do one thing at a time.
link |
01:55:44.100
You can now use that technology, these massively parallel reporter assays to test 10,000 locations
link |
01:55:49.280
at a time.
link |
01:55:51.440
We've made multiple modifications to that technology.
link |
01:55:55.160
One was sharper MPRA, which stands for, you know, basically getting a higher resolution
link |
01:56:04.080
view by tiling these, these elements so you can see where along the region of control
link |
01:56:14.800
are they acting.
link |
01:56:16.140
And we made another modification called Hydra for high, you know, definition regulatory
link |
01:56:23.240
annotation or something like that, which basically allows you to test 7 million of these at a
link |
01:56:30.080
time by sort of cutting them directly from the DNA.
link |
01:56:32.960
So instead of synthesizing, which basically has the limit of 450,000 that you can synthesize
link |
01:56:37.420
at a time, we basically said, Hey, if we want to test all accessible regions of the genome,
link |
01:56:42.600
let's just do an experiment that cuts accessible regions.
link |
01:56:45.620
Let's take those accessible regions, put them all with the same end joints of the puzzles,
link |
01:56:51.520
and then now use those to create a much, much larger array of things that you can test.
link |
01:56:59.680
And then tiling all of these regions, you can then pinpoint what are the driver nucleotides,
link |
01:57:04.160
what are the elements, how are they acting across 7 million experiments at a time.
link |
01:57:07.520
So basically this is all the same family of technology where you're basically using these
link |
01:57:12.580
parallel readouts of the barcodes.
link |
01:57:15.900
And then to do this, we used a technology called StarSeq for self transcribing reporter
link |
01:57:23.240
assays, a technology developed by Alex Stark, my former postdoc, who's now API over in Vienna.
link |
01:57:30.140
So we basically coupled the StarSeq, the self transcribing reporters where the enhancer
link |
01:57:37.240
can be part of the gene itself.
link |
01:57:39.040
So instead of having a separate barcode, that enhancer basically acts to turn on the gene
link |
01:57:43.600
and it's transcribed as part of the gene.
link |
01:57:46.080
So you don't have to have the two separate parts.
link |
01:57:47.640
Exactly.
link |
01:57:48.640
So you can just read them directly.
link |
01:57:49.640
So there's a constant improvements in this whole process.
link |
01:57:52.680
By the way, generating all these options, is it basically brute force?
link |
01:57:57.160
How much human intuition is?
link |
01:57:58.680
Oh gosh, of course it's human intuition and human creativity and incorporating all of
link |
01:58:04.040
the input data sets.
link |
01:58:06.040
Because again, the genome is enormous.
link |
01:58:08.440
3.2 billion, you don't want to test that.
link |
01:58:11.040
You basically use all of these tools that I've talked about already.
link |
01:58:14.280
You generate your top favorite 10,000 hypothesis, and then you go and test all 10,000.
link |
01:58:19.920
And then from what comes out, you can then go to the next step.
link |
01:58:24.080
So that's technology number two.
link |
01:58:25.920
So technology number one is robotics, automation, where you have thousands of wells and you
link |
01:58:30.440
constantly test them.
link |
01:58:32.140
The second technology is instead of having wells, you have these massively parallel readouts
link |
01:58:37.320
in sort of these pooled assays.
link |
01:58:40.000
The third technology is coupling CRISPR perturbations with these single cell RNA readouts.
link |
01:58:51.260
So let me make another parenthesis here to describe now single cell RNA sequencing.
link |
01:58:57.880
So what does single cell RNA sequencing mean?
link |
01:58:59.720
So RNA sequencing is what has been traditionally used, well, traditionally the last 20 years,
link |
01:59:07.760
ever since the advent of next generation sequencing.
link |
01:59:10.200
So basically before RNA expression profiling was based on these microarrays.
link |
01:59:14.620
The next technology after that was based on sequencing.
link |
01:59:17.500
So you chop up your RNA and you just sequence small molecules, just like you would sequence
link |
01:59:22.840
a genome, basically reverse transcribe the small RNAs into DNA, and you sequence that
link |
01:59:28.040
DNA in order to get the number of sequencing reads corresponding to the expression level
link |
01:59:35.600
of every gene in the genome.
link |
01:59:37.480
You now have RNA sequencing.
link |
01:59:39.680
How do you go to single cell RNA sequencing?
link |
01:59:42.520
That technology also went through stages of evolution.
link |
01:59:45.880
The first was microfluidics.
link |
01:59:48.120
You basically had these, or even chambers, you basically had these ways of isolating
link |
01:59:52.940
individual cells, putting them into a well for every one of these cells.
link |
01:59:57.320
So you have 384 well plates and you now do 384 parallel reactions to measure the expression
link |
02:00:03.280
of 384 cells.
link |
02:00:05.660
That sounds amazing and it was amazing, but we want to do a million cells.
link |
02:00:11.320
How do you go from these wells to a million cells?
link |
02:00:14.120
You can't.
link |
02:00:15.640
So what the next technology was after that is instead of using a well for every reaction,
link |
02:00:21.660
you now use a lipid droplet for every reaction.
link |
02:00:26.280
So you use micro droplets as reaction chambers to basically amplify RNA.
link |
02:00:33.660
So here's the idea.
link |
02:00:34.660
You basically have microfluidics where you basically have every single cell coming down
link |
02:00:39.280
one tube in your microfluidics and you have little bubbles getting created in the other
link |
02:00:44.040
way with specific primers that mark every cell with its own barcode.
link |
02:00:49.360
You basically couple the two and you end up with little bubbles that have a cell and tons
link |
02:00:55.040
of markers for that cell.
link |
02:00:57.400
You now mark up all of the RNA for that one cell with the same exact barcode and you then
link |
02:01:03.880
lyse all of the droplets and you sequence the heck out of that and you have for every
link |
02:01:09.360
RNA molecule, a unique identifier that tells you what cell was it on.
link |
02:01:12.880
That is such good engineering, microfluidics and using some kind of primer to put a label
link |
02:01:20.840
on the thing.
link |
02:01:21.840
I mean, you're making it sound easy.
link |
02:01:24.080
I assume it's beautiful, but it's gorgeous.
link |
02:01:27.400
So there's the next generation.
link |
02:01:29.560
So that's the second generation.
link |
02:01:31.120
Next generation is forget the microfluidics altogether.
link |
02:01:34.000
Just use big bottles.
link |
02:01:35.000
How can you possibly do that with big bottles?
link |
02:01:37.960
So here's the idea.
link |
02:01:39.400
You dissociate all of your cells or all of your nuclei from complex cells like brain
link |
02:01:43.680
cells that are very long and sticky so you can't do that.
link |
02:01:48.240
If you have blood cells or if you have neuronal nuclei or brain nuclei, you can basically
link |
02:01:52.520
dissociate let's say a million cells.
link |
02:01:56.160
You now want to add a unique barcode, a unique barcode in each one of a million cells using
link |
02:02:01.720
only big bottles.
link |
02:02:02.720
How can you possibly do that?
link |
02:02:04.440
Sounds crazy, but here's the idea.
link |
02:02:07.320
You use a hundred of these bottles, you randomly shuffle all your million cells and you throw
link |
02:02:13.880
them into those hundred bottles randomly, completely randomly.
link |
02:02:17.180
You add one barcode out of a hundred to every one of the cells.
link |
02:02:21.560
You then you now take them all out.
link |
02:02:23.560
You shuffle them again and you throw them again into the same hundred bottles.
link |
02:02:28.440
But now in a different randomization and you add a second barcode.
link |
02:02:33.960
So every cell now has two barcodes.
link |
02:02:36.880
You take them out again, you shuffle them and you throw them back in.
link |
02:02:40.280
Another third barcode is adding randomly from the same hundred barcodes.
link |
02:02:47.480
You've now labeled every cell probabilistically based on the unique path that he took of which
link |
02:02:53.920
of a hundred bottles did he go for the first time, which of a hundred bottles the second
link |
02:02:56.880
time and which of a hundred bottles the third time.
link |
02:03:00.160
A hundred times a hundred times a hundred is a million unique barcodes in every single
link |
02:03:05.240
one of these cells without ever using microfluidics.
link |
02:03:09.480
Very clever.
link |
02:03:10.480
It's beautiful, right?
link |
02:03:11.480
From a computer science perspective, that's very clever.
link |
02:03:12.880
Yeah.
link |
02:03:13.880
So you now have the single cell sequence technology.
link |
02:03:16.160
You can use the wells, you can use the bubbles or you can use the bottles and you have way
link |
02:03:22.040
The bubbles still sound pretty damn cool.
link |
02:03:23.680
The bubbles are awesome.
link |
02:03:24.680
And that's basically the main technology that we're using.
link |
02:03:26.640
So the bubbles is the main technology.
link |
02:03:29.680
So there are kits now that companies just sell to basically carry out single cell RNA
link |
02:03:34.360
sequencing that you can basically for $2,000, you can basically get 10,000 cells from one
link |
02:03:40.240
sample.
link |
02:03:42.560
And for every one of those cells, you basically have the transcription of thousands of genes.
link |
02:03:49.680
And you know, of course the data for any one cell is noisy, but being computer scientists,
link |
02:03:54.360
we can aggregate the data from all of the cells together across thousands of individuals
link |
02:03:58.640
together to basically make very robust inferences.
link |
02:04:02.120
Okay.
link |
02:04:03.120
So the third technology is basically single cell RNA sequencing that allows you to now
link |
02:04:07.160
start asking not just what is the brain expression level difference of that genetic variant,
link |
02:04:14.400
but what is the expression difference of that one genetic variant across every single subtype
link |
02:04:20.000
of brain cell?
link |
02:04:21.720
How is the variance changing?
link |
02:04:24.460
You can't just, you know, with a brain sample, you can just ask about the mean, what is the
link |
02:04:29.260
average expression?
link |
02:04:30.840
If I instead have 3000 cells that are neurons, I can ask not just what is the neuronal expression.
link |
02:04:38.280
I can say for layer five excitatory neurons of which I have, I don't know, 300 cells,
link |
02:04:44.240
what is the variance that this genetic variant has?
link |
02:04:48.240
So suddenly it's amazingly more powerful.
link |
02:04:51.000
I can basically start asking about this middle layer of gene expression at unprecedented
link |
02:04:55.240
levels.
link |
02:04:56.240
So when you look at the average, it washes out some potentially important signal that
link |
02:05:01.600
corresponds to ultimately the disease.
link |
02:05:04.160
Completely.
link |
02:05:05.160
Yeah.
link |
02:05:06.160
So that, I can do that at the RNA level, but I can also do that at the DNA level for the
link |
02:05:10.200
epigenome.
link |
02:05:11.200
So remember how before I was telling you about all this technology that we're using to probe
link |
02:05:14.760
the epigenome, one of them is DNA accessibility.
link |
02:05:18.160
So what we're doing in my lab is that from the same dissociation of say a brain sample
link |
02:05:23.200
where you now have all these tens of thousands of cells floating around, you basically take
link |
02:05:27.480
half of them to do RNA profiling and the other half to do epigenome profiling, both at the
link |
02:05:32.360
single cell level.
link |
02:05:34.140
So that allows you to now figure out what are the millions of DNA enhancers that are
link |
02:05:40.340
accessible in every one of tens of thousands of cells.
link |
02:05:45.000
And computationally, we can now take the RNA and the DNA readouts and group them together
link |
02:05:50.600
to basically figure out how is every enhancer related to every gene.
link |
02:05:57.600
And remember these sort of enhancer gene linking that we were doing across 833 samples?
link |
02:06:01.720
833 is awesome, don't get me wrong, but 10 million is way more awesome.
link |
02:06:08.240
So we can now look at correlated activity across 2.3 million enhancers and 20,000 genes
link |
02:06:14.600
in each of millions of cells to basically start piecing together the regulatory circuitry
link |
02:06:19.860
of every single type of neuron, every single type of astrocytes, oligodendrocytes, microglial
link |
02:06:25.440
cell inside the brains of 1,500 individuals that we sample across multiple different brain
link |
02:06:32.880
regions across both DNA and RNA.
link |
02:06:36.240
So that's the data set that my team generated last year alone.
link |
02:06:39.600
So in one year, we basically generated 10 million cells from human brain across a dozen
link |
02:06:46.560
different disorders, across schizophrenia, Alzheimer's, frontotemporal dementia, Lewy
link |
02:06:51.200
body dementia, ALS, Huntington's disease, post traumatic stress disorder, autism, bipolar
link |
02:07:01.000
disorder, healthy aging, et cetera.
link |
02:07:04.400
So it's possible that even just within that data set lie a lot of keys to understanding
link |
02:07:13.120
these diseases and then be able to like directly leads to then treatment.
link |
02:07:18.320
Correct.
link |
02:07:19.320
Correct.
link |
02:07:20.320
So basically we are now motivating.
link |
02:07:21.880
Yeah.
link |
02:07:22.880
So our computational team is in heaven right now and we're looking for people.
link |
02:07:25.680
I mean, if you have super smart.
link |
02:07:29.700
So this is a very interesting kind of side question.
link |
02:07:33.080
How much of this is biology?
link |
02:07:34.680
How much of this is computation?
link |
02:07:36.280
So you're the head of the computational biology group, but how much of, should you be comfortable
link |
02:07:44.080
with biology to be able to solve some of these problems?
link |
02:07:48.600
If you just find, if you put several of the hats you were on fundamentally, are you thinking
link |
02:07:54.120
like a computer scientist here?
link |
02:07:56.460
You have to.
link |
02:07:57.460
This is the only way.
link |
02:07:59.760
As I said, we are the descendants of the first digital computer.
link |
02:08:02.720
We're trying to understand the digital computer.
link |
02:08:05.000
We're trying to understand the circuitry, the logic of this digital core computer and
link |
02:08:11.240
all of these analog layers surrounding it.
link |
02:08:14.200
So the case that I've been making is that you cannot think one gene at a time.
link |
02:08:19.840
The traditional biology is dead.
link |
02:08:22.080
There's no way you cannot solve disease with traditional biology.
link |
02:08:24.960
You need it as a component.
link |
02:08:27.240
Once you figured out RX3 and RX5, you now can then say, Hey, have you guys worked on
link |
02:08:31.840
those genes with your single gene approach?
link |
02:08:33.880
We'd love to know everything you know.
link |
02:08:35.560
And if you haven't, we now know how important these genes are.
link |
02:08:38.960
Let's now launch a single gene program to dissect them and understand them.
link |
02:08:43.520
But you cannot use that as a way to dissect disease.
link |
02:08:46.680
You have to think genomically.
link |
02:08:48.580
You have to think from the global perspective and you have to build these circuits systematically.
link |
02:08:53.380
So we need numbers of computer scientists who are interested and willing to dive into
link |
02:08:59.220
these data fully, fully in and extract meaning.
link |
02:09:04.960
We need computer science people who can understand machine learning and inference and decouple
link |
02:09:11.960
these matrices, come up with super smart ways of dissecting them.
link |
02:09:16.360
But we also need computer scientists who understand biology, who are able to design the next generation
link |
02:09:22.880
of experiments.
link |
02:09:24.660
Because many of these experiments, no one in their right mind would design them without
link |
02:09:28.760
thinking of the analytical approach that you would use to deconvolve the data afterwards.
link |
02:09:33.020
Because it's massive amounts of ridiculously noisy data.
link |
02:09:36.640
And if you don't have the computational pipeline in your head before you even design the experiment,
link |
02:09:42.700
you would never design the experiment that way.
link |
02:09:44.760
That's brilliant.
link |
02:09:45.760
So in designing the experiment, you have to see the entirety of the computational pipeline.
link |
02:09:50.160
That drives the design.
link |
02:09:52.600
That even drives the necessity for that design.
link |
02:09:55.560
Basically, you know, if you didn't have a computer scientist way of thinking, you would
link |
02:10:00.320
never design these hugely combinatorial, massively parallel experiments.
link |
02:10:07.360
So that's why you need interdisciplinary teams, you need teams.
link |
02:10:10.680
And I want to sort of clarify that what do we mean by computational biology group?
link |
02:10:15.200
The focus is not on computational, the focus is on the biology.
link |
02:10:18.880
So we are a biology group.
link |
02:10:20.920
What type of biology?
link |
02:10:22.680
Computational biology.
link |
02:10:23.680
That's the type of biology that uses the whole genome.
link |
02:10:27.760
That's the type of biology that designs experiments, genomic experiments, that can only be interpreted
link |
02:10:33.040
in the context of the whole genome.
link |
02:10:34.600
Right.
link |
02:10:35.600
So it's philosophically looking at biology as a computer.
link |
02:10:39.800
Correct.
link |
02:10:40.800
Correct.
link |
02:10:41.800
So which is in the context of the history of biology is a big transformation.
link |
02:10:46.280
Yeah.
link |
02:10:47.280
Yeah.
link |
02:10:48.280
You can think of the name as what do we do?
link |
02:10:50.200
Only computation.
link |
02:10:51.240
That's not true.
link |
02:10:52.240
How do we study it?
link |
02:10:53.880
Only computationally.
link |
02:10:54.880
That is true.
link |
02:10:56.520
So all of these single cell sequencing can now be coupled with the technology that we
link |
02:11:00.480
talked about earlier for perturbation.
link |
02:11:02.920
So here's the crazy thing.
link |
02:11:04.560
Instead of using these wells and these robotic systems for doing one drug at a time or for
link |
02:11:10.720
perturbing one gene at a time in thousands of wells, you can now do this using a pool
link |
02:11:16.880
of cells and single cell RNA sequencing.
link |
02:11:20.120
How?
link |
02:11:21.120
You basically can take these perturbations using CRISPR and instead of using a single
link |
02:11:27.960
guide RNA, you can use a library of guide RNAs generated exactly the same way using
link |
02:11:32.920
this array technology.
link |
02:11:34.480
So you synthesize a thousand different guide RNAs.
link |
02:11:38.500
You now take each of these guide RNAs and you insert them in a pool of cells where every
link |
02:11:45.720
cell gets one perturbation.
link |
02:11:48.220
And you use CRISPR editing or CRISPR, so with either CRISPR Cas9 to edit a genome with these
link |
02:11:56.720
thousand perturbations or with the activation or with the repression.
link |
02:12:01.400
And you now can have a single cell readout where every single cell has received one of
link |
02:12:07.600
these modifications.
link |
02:12:09.600
And you can now in massively parallel ways, couple the perturbation and the readout in
link |
02:12:17.080
a single experiment.
link |
02:12:18.480
How are you tracking which perturbations each cell received?
link |
02:12:21.600
So there's ways of doing that, but basically one way is to make that perturbation an expressible
link |
02:12:27.320
vector so that part of your RNA reading is actually that perturbation itself.
link |
02:12:33.160
So you can basically put it in an expressible part so you can self drive it.
link |
02:12:37.740
So the point that I want to get across is that the sky's the limit.
link |
02:12:42.120
You basically have these tools, these building blocks of molecular biology.
link |
02:12:46.480
We have these massive data sets of computational biology.
link |
02:12:50.280
We have this huge ability to sort of use machine learning and statistical methods and, you
link |
02:12:56.160
know, linear algebra to sort of reduce the dimensionality of all these massive data sets.
link |
02:13:01.880
And then you end up with a series of actionable targets that you can then couple with pharma
link |
02:13:10.960
and just go after systematically.
link |
02:13:13.380
So the ability to sort of bring genetics to the epigenomics, to the transcriptomics, to
link |
02:13:19.760
the cellular readouts using these sort of high throughput perturbation technologies
link |
02:13:24.280
that I'm talking about and ultimately to the organismal through the electronic health record
link |
02:13:30.040
endophenotypes and ultimately the disease battery of assays at the cognitive level,
link |
02:13:36.520
at the physiological level and, you know, every other level.
link |
02:13:42.000
There is no better or more exciting field, in my view, to be a computer scientist then
link |
02:13:46.760
or to be a scientist in period.
link |
02:13:48.640
Basically this confluence of technologies, of computation, of data, of insight and of
link |
02:13:54.280
tools for manipulation is unprecedented in human history.
link |
02:13:58.860
And I think this is what's shaping the next century to really be a transformative century
link |
02:14:04.620
for our species and for our planet.
link |
02:14:09.440
Do you think the 21st century will be remembered for the big leaps in understanding and alleviation
link |
02:14:17.200
of biology?
link |
02:14:18.800
If you look at the path between discovery and therapeutics, it's been on the order of
link |
02:14:23.720
50 years, it's been shortened to 40, 30, 20, and now it's on the order of 10 years.
link |
02:14:29.660
But the huge number of technologies that are going on right now for discovery will result
link |
02:14:36.400
undoubtedly in the most dramatic manipulation of human biology that we've ever seen in the
link |
02:14:42.600
history of humanity in the next few years.
link |
02:14:45.240
Do you think we might be able to cure some of the diseases we started this conversation
link |
02:14:48.920
with?
link |
02:14:49.920
Absolutely.
link |
02:14:50.920
Absolutely.
link |
02:14:51.920
It's only a matter of time.
link |
02:14:54.320
Basically the complexity is enormous and I don't want to underestimate the complexity
link |
02:14:58.480
but the number of insights is unprecedented and the ability to manipulate is unprecedented
link |
02:15:03.800
and the ability to deliver these small molecules and other non traditional medicine perturbations,
link |
02:15:11.040
there's a new generation of perturbations that you can use at the DNA level, at the
link |
02:15:17.440
RNA level, at the micro RNA level, at the epigenomic level, there's a battery of new
link |
02:15:24.440
generations of perturbations.
link |
02:15:26.560
If you couple that with cell type identifiers that can basically sense when you are in the
link |
02:15:32.120
right cell based on the specific combination and then turn on that intervention for that
link |
02:15:36.840
cell, you can now think of combinatorial interventions where you can basically sort of feed a synthetic
link |
02:15:42.560
biology construct to someone that will basically do different things in different cells.
link |
02:15:47.680
So basically for cancer, this is one of the therapeutics that our collaborator Ron Weiss
link |
02:15:51.500
is using to basically start sort of engineering the circuits that will use micro RNA sensors
link |
02:15:56.240
of the environment to sort of know if you're in a tumor cell or if you're in an immune
link |
02:15:59.840
cell or if you're in a stromal cell and so forth and basically turn on particular interventions
link |
02:16:04.180
there.
link |
02:16:05.180
You can sort of create constructs that are tuned to only the liver cells or only the
link |
02:16:11.080
heart cells or only the brain cells and then have these new generations of therapeutics
link |
02:16:18.640
coupled with this immense amount of knowledge on the sort of which targets to choose and
link |
02:16:24.000
what biological processes to measure and how to intervene.
link |
02:16:27.680
My view is that disease is going to be fundamentally altered and alleviated as we go forward.
link |
02:16:36.400
Next time we talk, we'll talk about the philosophical implications of that and the effect of life,
link |
02:16:40.960
but let's stick to biology for just a little longer.
link |
02:16:44.200
We did pretty good today.
link |
02:16:45.200
We stuck to the science.
link |
02:16:49.520
What are you excited in terms of the future of this field, the technologies in your own
link |
02:16:56.000
group, in your own mind, you're leading the world at MIT in the science and the engineering
link |
02:17:02.560
of this work.
link |
02:17:04.480
So what are you excited about here?
link |
02:17:06.440
I could not be more excited.
link |
02:17:08.920
We are one of many, many teams who are working on this.
link |
02:17:12.720
In my team, the most exciting parts are, you know, many folds.
link |
02:17:17.000
So basically we've now assembled these battery of technologies.
link |
02:17:20.360
We've assembled these massive, massive data sets and now we're really sort of in the stage
link |
02:17:24.960
of our team's path of generating disease insights.
link |
02:17:30.460
So we are simultaneously working on a paper on schizophrenia right now that is basically
link |
02:17:36.480
using the single cell profiling technologies, using this editing and manipulation technologies
link |
02:17:40.880
to basically show how the master regulators underlying changes in the brain that are sort
link |
02:17:47.840
of found in schizophrenia are in fact affecting excitatory neurons and inhibitory neurons
link |
02:17:53.320
in pathways that are active both in synaptic pruning, but also in early development.
link |
02:17:59.280
We've basically found this set of four regulators that are connecting these two processes that
link |
02:18:03.220
were previously separate in schizophrenia in sort of having a sort of more unified view
link |
02:18:10.200
across those two sides.
link |
02:18:12.720
The second one is in the area of metabolism.
link |
02:18:15.520
We basically now have a beautiful collaboration with the Goodyear lab that's basically looking
link |
02:18:19.280
at multi tissue perturbations in six or seven different tissues across the body in the context
link |
02:18:29.160
of exercise and in the context of nutritional interventions using both mouse and human,
link |
02:18:35.920
where we can basically see what are the cell to cell communications that are changing across
link |
02:18:41.680
them.
link |
02:18:42.680
And what we're finding is this immense role of both immune cells as well as adipocyte
link |
02:18:47.840
stem cells in sort of reshaping that circuitry of all of these different tissues and that's
link |
02:18:53.080
sort of painting to a new path for therapeutical intervention there.
link |
02:18:56.920
In Alzheimer's, it's this huge focus on microglia and now we're discovering different classes
link |
02:19:02.540
of microglial cells that are basically either synaptic or immune.
link |
02:19:10.360
And these are playing vastly different roles in Alzheimer's versus in schizophrenia.
link |
02:19:16.120
And what we're finding is this immense complexity as you go further and further down of how
link |
02:19:22.400
in fact there's 10 different types of microglia, each with their own sort of expression programs.
link |
02:19:28.400
We used to think of them as, oh yeah, they're microglia, but in fact now we're realizing
link |
02:19:32.480
just even in that sort of least abundant of cell types, there's this incredible diversity
link |
02:19:37.960
there.
link |
02:19:39.620
The differences between brain regions is another sort of major, major insight.
link |
02:19:44.280
Often one would think that, oh, astrocytes are astrocytes no matter where they are.
link |
02:19:48.800
But no, there's incredible region specific differences in the expression patterns of
link |
02:19:54.240
all of the major brain cell types across different brain regions.
link |
02:19:57.480
So basically there's the neocortical regions that are sort of the recent innovation that
link |
02:20:01.080
makes us so different from all other species.
link |
02:20:03.620
There's the sort of reptilian brain sort of regions that are sort of much more very extremely
link |
02:20:10.080
distinct.
link |
02:20:11.080
There's the cerebellum.
link |
02:20:12.080
Each of those basically is associated in a different way with disease.
link |
02:20:17.520
And what we're doing now is looking into pseudo temporal models for how disease progresses
link |
02:20:23.680
across different regions of the brain.
link |
02:20:25.820
If you look at Alzheimer's, it basically starts in this small region called the entorhinal
link |
02:20:30.000
cortex and then it spreads through the brain and through the hippocampus and ultimately
link |
02:20:38.440
affecting the neocortex.
link |
02:20:39.520
And with every brain region that it hits, it basically has a different impact on the
link |
02:20:46.080
cognitive and memory aspects, orientation, short term memory, long term memory, et cetera,
link |
02:20:52.920
which is dramatically affecting the cognitive path that the individuals go through.
link |
02:20:58.320
So what we're doing now is creating these computational models for ordering the cells
link |
02:21:04.600
and the regions and the individuals according to their ability to predict Alzheimer's disease.
link |
02:21:10.560
So we can have a cell level predictor of pathology that allows us to now create a temporal time
link |
02:21:17.820
course that tells us when every gene turns on along this pathology progression and then
link |
02:21:22.860
trace that across regions and pathological measures that are region specific, but also
link |
02:21:28.040
cognitive measures and so on and so forth.
link |
02:21:30.380
So that allows us to now sort of for the first time, look at can we actually do early intervention
link |
02:21:35.540
for Alzheimer's where we know that the disease starts manifesting for 10 years before you
link |
02:21:40.920
actually have your first cognitive loss.
link |
02:21:44.280
Can we start seeing that path to build new diagnostics, new prognostics, new biomarkers
link |
02:21:50.360
for this sort of early intervention in Alzheimer's?
link |
02:21:54.420
The other aspect that we're looking at is mosaicism.
link |
02:21:57.080
We talked about the common variants and the rare variants, but in addition to those rare
link |
02:22:01.120
variants as your initial cell that forms the zygote divides and divides and divides, with
link |
02:22:08.520
every cell division there are additional mutations that are happening.
link |
02:22:12.480
So what you end up with is your brain being a mosaic of multiple different types of genetic
link |
02:22:18.000
underpinnings.
link |
02:22:19.320
Some cells contain a mutation that other cells don't have.
link |
02:22:23.380
So every human has the common variants that all of us carry to some degree, the rare variants
link |
02:22:31.200
that your immediate tree of the human species carries, and then there's the somatic variant,
link |
02:22:37.360
which is the tree that happened after the zygote that sort of forms your own body.
link |
02:22:44.280
So these somatic alterations is something that has been previously inaccessible to study
link |
02:22:50.840
in human postmortem samples.
link |
02:22:53.240
But right now with the advent of single cell RNA sequencing, in this particular case, we're
link |
02:22:58.240
using the well based sequencing, which is much more expensive, but gives you a lot richer
link |
02:23:01.920
information about each of those transcripts.
link |
02:23:04.560
So we're using now that richer information to infer mutations that have happened in each
link |
02:23:10.200
of the thousands of genes that sort of are active in these cells, and then understand
link |
02:23:16.640
how the genome relates to the function, this genotype phenotype relationship that we usually
link |
02:23:25.320
build in GWAS between in genome wide association studies between genetic variation and disease.
link |
02:23:31.400
We're now building that at the cell level, where for every cell, we can relate the unique
link |
02:23:36.400
specific genome of that cell with the expression patterns of that cell, and the predicted function
link |
02:23:42.920
using these predictive models that I mentioned before on this regulation for cognition for
link |
02:23:47.480
pathology in Alzheimer's at the cell level.
link |
02:23:51.000
And what we're finding is that the genes that are altered and the genetic regions that are
link |
02:23:54.960
altered in common variants versus rare variants versus somatic variants are actually very
link |
02:23:59.640
different from each other.
link |
02:24:01.280
The somatic variants are pointing to neuronal energetics and oligodendrocyte functions that
link |
02:24:08.860
are not visible in the genetic legions that you find for the common variants, probably
link |
02:24:13.000
because they have too strong of an effect that evolution is just not tolerating them
link |
02:24:17.480
on the common side of the allele frequency spectrum.
link |
02:24:20.960
So the somatic one, that's the variation that happens after the zygote, after you individual.
link |
02:24:26.360
I mean, this is a dumb question, but there's mutation and variation, I guess that happens
link |
02:24:31.600
there.
link |
02:24:32.600
And you're saying that they're through this, if we focus in on individual cells, we're
link |
02:24:37.200
able to detect the story that's interesting there, and that might be a very unique kind
link |
02:24:42.640
of important variability that arises for, you said neuronal or something that would
link |
02:24:49.320
sound...
link |
02:24:50.320
Energetics.
link |
02:24:51.320
Energetics, sounds like a cool term.
link |
02:24:52.320
So, I mean, the metabolism of humans is dramatically altered from that of nearby species.
link |
02:24:59.520
We talked about that last time that basically we are able to consume meat that is incredibly
link |
02:25:04.500
energy rich, and that allows us to sort of have functions that are meeting this humongous
link |
02:25:13.240
brain that we have.
link |
02:25:14.240
So basically on one hand, every one of our brain cells is much more energy efficient
link |
02:25:18.280
than our neighbors, than our relatives.
link |
02:25:20.560
Number two, we have way more of these cells.
link |
02:25:23.360
And number three, we have this new diet that allows us to now feed all these needs.
link |
02:25:30.260
That basically creates a massive amount of damage, oxidative damage from this huge super
link |
02:25:36.540
powered factory of ideas and thoughts that we carry in our skull.
link |
02:25:42.360
And that factory has energetic needs, and there's a lot of sort of biological processes
link |
02:25:47.540
underlying that, that we are finding are altered in the context of Alzheimer's disease.
link |
02:25:52.960
That's fascinating.
link |
02:25:53.960
So you have to consider all of these systems if you want to understand even something like
link |
02:25:59.680
diseases that you would maybe traditionally associate with just the particular cells of
link |
02:26:04.440
the brain.
link |
02:26:07.440
The immune system, the metabolic system, the metabolic system.
link |
02:26:11.240
And these are all the things that makes us uniquely human.
link |
02:26:13.440
So our immune system is dramatically different from that of our neighbors.
link |
02:26:17.120
Our societies are so much more clustered.
link |
02:26:19.600
The history of infection that have plagued the human population is dramatically different
link |
02:26:24.840
from every other species.
link |
02:26:27.080
The way that our society and our population has sort of exploded has basically put unique
link |
02:26:31.320
pressures on our immune system.
link |
02:26:33.360
And our immune system has both coped with that density and also been shaped by, as I
link |
02:26:37.480
mentioned, the vast amount of death that has happened in the Black Plague and other sort
link |
02:26:42.200
of selective events in human history, famines, ice ages, and so forth.
link |
02:26:47.180
So that's number one on the sort of immune side.
link |
02:26:49.940
On the metabolic side, again, we are able to sort of run marathons.
link |
02:26:55.560
I don't know if you remember the sort of human versus horse experiment where the horse actually
link |
02:26:59.040
tires out faster than the human and the human actually wins.
link |
02:27:03.480
So on the metabolic side, we're dramatically different.
link |
02:27:05.940
On the immune side, we're dramatically different.
link |
02:27:07.560
On the brain side, again, you know, no need to sort of, you know, it's a no brainer of
link |
02:27:12.400
how our brain is like just enormously more capable.
link |
02:27:16.880
And then, you know, in the side of cancer, so basically the cancers that humans are having,
link |
02:27:21.740
the exposures, the environmental exposures is again, dramatically different.
link |
02:27:25.940
And the lifespan, the expansion of human lifespan is unseen in any other species in, you know,
link |
02:27:32.880
recent evolutionary history.
link |
02:27:35.720
And that now leads to a lot of new disorders that are starting to, you know, manifest late
link |
02:27:42.360
in life.
link |
02:27:43.920
So you know, Alzheimer's is one example where basically, you know, these vast energetic
link |
02:27:48.200
needs over a lifetime of thinking can basically lead to all of these debris and eventually
link |
02:27:54.800
saturate the system and lead to, you know, Alzheimer's in the late life.
link |
02:28:00.840
But there's, you know, there's just such a dramatic set of frontiers when it comes to
link |
02:28:07.440
aging research that, you know, so what I often like to say is that if you want to engineer
link |
02:28:14.360
a car to go from 70 miles an hour to 120 miles an hour, that's fine.
link |
02:28:18.240
You can basically, you know, fix a few components.
link |
02:28:20.480
If you wanted to now go at 400 miles an hour, you have to completely redesign the entire
link |
02:28:24.240
car because the system has just not evolved to go that far.
link |
02:28:31.220
Basically our human body has only evolved to live to, I don't know, 120, maybe we can
link |
02:28:36.480
get to 150 with minor changes.
link |
02:28:39.280
But if, you know, as we start pushing these frontiers for not just living, but well living,
link |
02:28:45.240
the Fzine that we talked about last time.
link |
02:28:48.240
So to basically push Fzine into the 80s and 90s and a hundreds and, you know, much further
link |
02:28:53.200
than that, we will face new challenges that have, you know, never been faced before in
link |
02:29:00.400
terms of cancer, the number of divisions, in terms of Alzheimer's and brain related
link |
02:29:04.200
disorders, in terms of metabolic disorders, in terms of regeneration, there's just so
link |
02:29:08.880
many different frontiers ahead of us.
link |
02:29:10.920
So I am thrilled about where we're heading.
link |
02:29:14.040
So basically I see this confluence in my lab and many other labs of AI, of, you know, sort
link |
02:29:20.600
of, you know, the next frontier of AI for drug design.
link |
02:29:22.920
So basically these sort of graph neural networks on specific chemical designs that allow you
link |
02:29:30.520
to create new generations of therapeutics.
link |
02:29:34.840
These molecular biology tricks for intervening at the system at every level, these personalized
link |
02:29:42.400
medicine prediction, diagnosis, and prognosis using the electronic health records and using
link |
02:29:49.960
these polygenic risk scores weighted by the burden, the number of mutations that are accumulating
link |
02:29:56.640
across common rare and somatic variants, the burden converging across all of these different
link |
02:30:03.340
molecular pathways, the delivery of specific drugs and specific interventions into specific
link |
02:30:10.000
cell types.
link |
02:30:11.000
And again, you've talked with Bob Langer about this, there's, you know, many giants in that
link |
02:30:14.080
field.
link |
02:30:15.080
And then the last concept is not intervening at the single gene level.
link |
02:30:20.560
I want you to sort of conceptualize the concept of an on target side effect.
link |
02:30:27.600
What is an on target side effect?
link |
02:30:29.200
An off target side effect is when you design a molecule to target one gene and instead
link |
02:30:33.320
it targets another gene and you have side effects because of that.
link |
02:30:36.720
And on target side effect is when your molecule does exactly what you were expecting, but
link |
02:30:41.040
that gene is plyotropic.
link |
02:30:43.840
Plyo means many, tropos means ways, many ways, it acts in many ways.
link |
02:30:48.160
It's a multifunctional gene.
link |
02:30:50.040
So you find that this gene plays a role in this, but as we talked about the wiring of
link |
02:30:55.320
genes to phenotypes is extremely dense and extremely complex.
link |
02:30:59.000
So the next stage of intervention will be intervening not at the gene level, but at
link |
02:31:04.000
the network level.
link |
02:31:06.160
Intervening at the set of pathways and the set of genes with multi input perturbations
link |
02:31:11.440
to the system, multi input modulations, pharmaceutical or other interventional, and that basically
link |
02:31:18.040
allow you to now work at the sort of full level of understanding, not just in your brain,
link |
02:31:24.980
but across your body, not just in one gene, but across the set of pathways and so on and
link |
02:31:29.400
so forth for every one of these disorders.
link |
02:31:31.980
So I think that we're finally at the level of systems medicine of basically instead of
link |
02:31:37.320
sort of medicine being at the single gene level, medicine being at the systems level
link |
02:31:42.120
where it can be personalized based on the specific set of genetic markers and genetic
link |
02:31:46.480
perturbations that you are either born with or that you have developed during your lifetime.
link |
02:31:53.040
Your unique set of exposures, your unique set of biomarkers, and your unique set of
link |
02:31:59.480
current set of conditions through your EHR and other ways.
link |
02:32:06.480
And the precision component of intervening extremely precisely in the specific pathways
link |
02:32:12.920
and the specific combinations of genes that should be modulated to sort of bring you from
link |
02:32:16.840
the disease state to the physiologically normal state or even to physiologically improved
link |
02:32:23.480
state through this combination of interventions.
link |
02:32:25.640
So that's in my view, the field where basically computer science comes together with artificial
link |
02:32:30.080
intelligence statistics, all of these other tools, molecular biology technologies and
link |
02:32:34.200
biotechnology and pharmaceutical technologies that are sort of revolutionary in the way
link |
02:32:37.960
of intervention.
link |
02:32:38.960
And of course, this massive amount of molecular biology and data gathering and generation
link |
02:32:43.240
and perturbation in massively parallel ways.
link |
02:32:46.360
So there's no better way.
link |
02:32:47.700
There's no better time.
link |
02:32:49.740
There's no better place to be sort of looking at this whole confluence of ideas.
link |
02:32:56.800
And I'm just so thrilled to be a small part of this amazing, enormous ecosystem.
link |
02:33:01.440
It's exciting to imagine what humans of 100, 200 years from now, what their life experience
link |
02:33:07.520
is like, because these ideas seem to have potential to transform the quality of life
link |
02:33:13.720
that, when they look back at us, they probably wonder how we were put up with all the suffering
link |
02:33:22.200
in the world.
link |
02:33:23.200
Manolis, it's a huge honor.
link |
02:33:25.480
Thank you for spending this early Sunday morning with me.
link |
02:33:29.240
I deeply appreciate it.
link |
02:33:30.240
See you next time.
link |
02:33:31.240
Sounds like a plan.
link |
02:33:32.240
Thank you, Lex.
link |
02:33:33.960
Thanks for listening to this conversation with Manolis Kellis.
link |
02:33:36.880
And thank you to our sponsors, SEMrush, which is an SEO optimization tool.
link |
02:33:43.280
Pessimist Archive, which is one of my favorite history podcasts.
link |
02:33:47.400
8Sleep, which is a self cooling mattress with smart sensors and an app.
link |
02:33:52.680
And finally, BetterHelp, which is an online therapy service.
link |
02:33:57.360
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
02:34:02.520
If you enjoy this thing, subscribe on YouTube, review it with 5 Stars and Apple Podcasts,
link |
02:34:08.000
follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
link |
02:34:13.240
And now, let me leave you with some words from Haruki Murakami.
link |
02:34:19.120
Human beings are ultimately nothing but carriers, passageways for genes.
link |
02:34:24.580
They ride us into the ground like racehorses from generation to generation.
link |
02:34:30.300
Genes don't think about what constitutes good or evil.
link |
02:34:34.160
They don't care whether we're happy or unhappy.
link |
02:34:37.040
We're just means to an end for them.
link |
02:34:40.060
The only thing they think about is what is most efficient for them.
link |
02:34:45.960
Thank you for listening, and hope to see you next time.