back to indexManolis Kellis: Biology of Disease | Lex Fridman Podcast #133
link |
The following is a conversation with Manolis Kellis, his third time on the podcast.
link |
He is a professor at MIT and head of the MIT Computational Biology Group.
link |
This time we went deep on the science, biology, and genetics.
link |
So this is a bit of an experiment.
link |
Manolis went back and forth between the basics of biology to the latest state of the art
link |
He's a master at this, so I just sat back and enjoyed the ride.
link |
This conversation happened at 7am, so it's yet another podcast episode after an all nighter
link |
And once again, since the universe has a sense of humor, this one was a tough one for my
link |
brain to keep up, but I did my best and I never shy away from a good challenge.
link |
Quick mention of each sponsor, followed by some thoughts related to the episode.
link |
First is SEMrush, the most advanced SEO optimization tool I've ever come across.
link |
I don't like looking at numbers, but someone probably should, it helps you make good decisions.
link |
Second is Pessimist Archive, they're back, one of my favorite history podcasts on why
link |
people resist new things from recorded music to umbrellas to cars, chess, coffee, and the
link |
Third is 8sleep, a mattress that cools itself, measures heart rate variability, has an app,
link |
and has given me yet another reason to look forward to sleep, including the all important
link |
And finally, BetterHelp, online therapy when you want to face your demons with a licensed
link |
professional, not just by doing the David Goggins like physical challenges like I seem
link |
to do on occasion.
link |
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
As a side note, let me say that biology in the brain and in the various systems of the
link |
body fill me with awe every time I think about how such a chaotic mess coming from its humble
link |
origins in the ocean was able to achieve such incredibly complex and robust mechanisms of
link |
life that survived despite all the forces of nature that want to destroy it.
link |
It is so unlike the computing systems we humans have engineered that it makes me feel that
link |
in order to create artificial general intelligence and artificial consciousness, we may have
link |
to completely rethink how we engineer computational systems.
link |
If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast, follow
link |
on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
link |
And now, here's my conversation with Manolis Callas.
link |
So your group at MIT is trying to understand the molecular basis of human disease.
link |
What are some of the biggest challenges in your view?
link |
Don't get me started.
link |
I mean, understanding human disease is the most complex challenge in modern science.
link |
So because human disease is as complex as the human genome, it is as complex as the
link |
human brain, and it is in many ways, even more complex because the more we understand
link |
disease complexity, the more we start understanding genome complexity and epigenome complexity
link |
and brain circuitry complexity and immune system complexity and cancer complexity and
link |
so on and so forth.
link |
So traditionally, human disease was following basic biology.
link |
You would basically understand basic biology in model organisms like, you know, mouse and
link |
You would understand sort of mammalian biology and animal biology and eukaryotic biology
link |
in sort of progressive layers of complexity, getting closer to human phylogenetically.
link |
And you would do perturbation experiments in those species to see if I knock out a gene,
link |
And based on the knocking out of these genes, you would basically then have a way to drive
link |
human biology because you would sort of understand the functions of these genes.
link |
And then if you find that a human gene locus, something that you've mapped from human genetics
link |
to that gene is related to a particular human disease, you'd say, aha, now I know the function
link |
of the gene from the model organisms.
link |
I can now go and understand the function of that gene in human.
link |
But this is all changing.
link |
This is dramatically changed.
link |
So that was the old way of doing basic biology.
link |
You would start with the animal models, the eukaryotic models, the mammalian models, and
link |
then you would go to human.
link |
Human genetics has been so transformed in the last decade or two that human genetics
link |
is now actually driving the basic biology.
link |
There is more genetic mutation information in the human genome than there will ever be
link |
in any other species.
link |
What do you mean by mutation information?
link |
So perturbations is how you understand systems.
link |
So an engineer builds systems and then they know how they work from the inside out.
link |
A scientist studies systems through perturbations.
link |
You basically say, if I poke that balloon, what's going to happen?
link |
And I'm going to film it in super high resolution, understand, I don't know, aerodynamics or
link |
fluid dynamics if it's filled with water, et cetera.
link |
So you can then make experimentation by perturbation and then the scientific process is sort of
link |
building models that best fit the data, designing new experiments that best test your models
link |
and challenge your models and so on and so forth.
link |
This is the same thing with science.
link |
Basically if you're trying to understand biological science, you basically want to do perturbations
link |
that then drive the models.
link |
So how do these perturbations allow you to understand disease?
link |
So if you know that a gene is related to disease, you don't want to just know that it's related
link |
You want to know what is the disease mechanism because you want to go and intervene.
link |
So the way that I like to describe it is that traditionally epidemiology, which is basically
link |
the study of disease, you know, sort of the observational study of disease has been about
link |
correlating one thing with another thing.
link |
So if you have a lot of people with liver disease who are also alcoholics, you might
link |
say, well, maybe the alcoholism is driving the liver disease or maybe those who have
link |
liver disease self medicate with alcohol.
link |
So the connection could be either way.
link |
With genetic epidemiology, it's about correlating changes in genome with phenotypic differences
link |
and then you know the direction of causality.
link |
So if you know that a particular gene is related to the disease, you can basically say, okay,
link |
perturbing that gene in mouse causes the mice to have X phenotype.
link |
So perturbing that gene in human causes the humans to have the disease.
link |
So I can now figure out what are the detailed molecular phenotypes in the human that are
link |
related to that organismal phenotype in the disease.
link |
So it's all about understanding disease mechanism, understanding what are the pathways, what
link |
are the tissues, what are the processes that are associated with the disease so that we
link |
know how to intervene.
link |
You can then prescribe particular medications that also alter these processes.
link |
You can prescribe lifestyle changes that also affect these processes and so on and so forth.
link |
That's such a beautiful puzzle to try to solve.
link |
Like what kind of perturbations eventually have this ripple effect that leads to disease
link |
across the population.
link |
And then you study that for animals or mice first and then see how that might possibly
link |
connect to humans.
link |
How hard is that puzzle of trying to figure out how little perturbations might lead to,
link |
in a stable way, to a disease?
link |
In animals, we make the puzzle simpler because we perturb one gene at a time.
link |
That's the beauty of this, the power of animal models.
link |
You can basically decouple the perturbations.
link |
You only do one perturbation and you only do strong perturbations at a time.
link |
In human, the puzzle is incredibly complex because obviously you don't do human experimentation.
link |
You wait for natural selection and natural genetic variation to basically do its own
link |
experiments, which it has been doing for hundreds and thousands of years in the human population
link |
and for hundreds of thousands of years across the history leading to the human population.
link |
So you basically take this natural genetic variation that we all carry within us.
link |
Every one of us carries 6 million perturbations.
link |
So I've done 6 million experiments on you, 6 million experiments on me, 6 million experiments
link |
on every one of 7 billion people on the planet.
link |
What's the 6 million correspond to?
link |
6 million unique genetic variants that are segregating in the human population.
link |
Every one of us carries millions of polymorphic sites, poly, many, morph, forms.
link |
Polymorphic means many forms, variants.
link |
That basically means that every one of us has single nucleotide alterations that we
link |
have inherited from mom and from dad that basically can be thought of as tiny little
link |
Most of them don't do anything, but some of them lead to all of the phenotypic differences
link |
that we see between us.
link |
The reason why two twins are identical is because these variants completely determine
link |
the way that I'm going to look at exactly 93 years of age.
link |
How happy are you with this kind of data set?
link |
Is it large enough of the human population of Earth?
link |
Is that too big, too small?
link |
Yeah, so is it large enough is a power analysis question.
link |
In every one of our grants, we do a power analysis based on what is the effect size
link |
that I would like to detect and what is the natural variation in the two forms.
link |
Every time you do a perturbation, you're asking, I'm changing form A into form B. Form A has
link |
some natural phenotypic variation around it and form B has some natural phenotypic variation
link |
If those variances are large and the differences between the mean of A and the mean of B are
link |
small, then you have very little power.
link |
The further the means go apart, that's the effect size, the more power you have, and
link |
the smaller the standard deviation, the more power you have.
link |
So basically when you're asking, is that sufficiently large, certainly not for everything, but we
link |
already have enough power for many of the stronger effects in the more tight distributions.
link |
So that's the hopeful message that there exists parts of the genome that have a strong effect
link |
that has a small variance.
link |
That's exactly right.
link |
Unfortunately, those perturbations are the basis of disease in many cases.
link |
So it's not a hopeful message.
link |
Sometimes it's a terrible message.
link |
It's basically, well, some people are sick, but if we can figure out what are these contributors
link |
to sickness, we can then help make them better and help many other people better who don't
link |
carry that exact mutation, but who carry mutations on the same pathways.
link |
And that's what we like to call the allelic series of a gene.
link |
You basically have many perturbations of the same gene in different people, each with a
link |
different frequency in the human population and each with a different effect on the individual
link |
that carries them.
link |
So you said in the past there would be these small experiments on perturbations and animal
link |
What does this puzzle solving process look like today?
link |
So we basically have something like 7 billion people in the planet and every one of them
link |
carries something like 6 million mutations.
link |
You basically have an enormous matrix of genotype by phenotype by systematically measuring the
link |
phenotype of these individuals.
link |
And the traditional way of measuring this phenotype has been to look at one trait at
link |
You would gather families and you would sort of paint the pedigrees of a strong effect,
link |
what we like to call Mendelian mutation, so a mutation that gets transmitted in a dominant
link |
or a recessive, but strong effect form where basically one locus plays a very big role
link |
And you could then look at carriers versus non carriers in one family, carriers versus
link |
non carriers in another family and do that for hundreds, sometimes thousands of families
link |
and then trace these inheritance patterns and then figure out what is the gene that
link |
Is this the matrix that you're showing in talks or lectures?
link |
So that matrix is the input to those stuff that I show in talks.
link |
So basically that matrix has traditionally been strong effect genes.
link |
What the matrix looks like now is instead of pedigrees, instead of families, you basically
link |
have thousands and sometimes hundreds of thousands of unrelated individuals, each with all of
link |
their genetic variants and each with their phenotype, for example, height or lipids or,
link |
you know, whether they're sick or not for a particular trait.
link |
That has been the modern view instead of going to families, going to unrelated individuals
link |
with one phenotype at a time.
link |
And what we're doing now as we're maturing in all of these sciences is that we're doing
link |
this in the context of large medical systems or enormous cohorts that are very well phenotyped
link |
across hundreds of phenotypes, sometimes with our complete electronic health record.
link |
So you can now start relating not just one gene segregating one family, not just thousands
link |
of variants segregating with one phenotype, but now you can do millions of variants versus
link |
hundreds of phenotypes.
link |
And as a computer scientist, I mean, deconvolving that matrix, partitioning it into the layers
link |
of biology that are associated with every one of these elements is a dream come true.
link |
It's like the world's greatest puzzle.
link |
And you can now solve that puzzle by throwing in more and more knowledge about the function
link |
of different genomic regions and how these functions are changed across tissues and in
link |
the context of disease.
link |
And that's what my group and many other groups are doing.
link |
We're trying to systematically relate this genetic variation with molecular variation
link |
at the expression level of the genes, at the epigenomic level of the gene regulatory circuitry,
link |
and at the cellular level of what are the functions that are happening in those cells,
link |
at the single cell level using single cell profiling, and then relate all that vast amount
link |
of knowledge computationally with the thousands of traits that each of these of thousands
link |
of variants are perturbing.
link |
I mean, this is something we talked about, I think last time.
link |
So there's these effects at different levels that happen.
link |
You said at a single cell level, you're trying to see things that happen due to certain perturbations.
link |
And then it's not just like a puzzle of perturbation and disease.
link |
It's perturbation then effect at a cellular level, then at an organ level, a body, like,
link |
how do you disassemble this into like what your group is working on?
link |
You're basically taking a bunch of the hard problems in the space.
link |
How do you break apart a difficult disease and break it apart into problems that you,
link |
into puzzles that you can now start solving?
link |
So there's a struggle here.
link |
Super scientists love hard puzzles and they're like, oh, I want to build a method that just
link |
deconvolves the whole thing computationally.
link |
And that's very tempting and it's very appealing, but biologists just like to decouple that
link |
complexity experimentally, to just like peel off layers of complexity experimentally.
link |
And that's what many of these modern tools that my group and others have both developed
link |
The fact that we can now figure out tricks for peeling off these layers of complexity
link |
by testing one cell type at a time or by testing one cell at a time.
link |
And you could basically say, what is the effect of these genetic variants associated with
link |
Alzheimer's on human brain?
link |
Human brain sounds like, oh, it's an organ, of course, just go one organ at a time.
link |
But human brain has of course, dozens of different brain regions and within each of these brain
link |
regions, dozens of different cell types and every single type of neuron, every single
link |
type of glial cell between astrocytes, oligodendrocytes, microglia, between all of the neural cells
link |
and the vascular cells and the immune cells that are co inhabiting the brain between the
link |
different types of excitatory and inhibitory neurons that are sort of interacting with
link |
each other between different layers of neurons in the cortical layers.
link |
Every single one of these has a different type of function to play in cognition, in
link |
interaction with the environment, in maintenance of the brain, in energetic needs, in feeding
link |
the brain with blood, with oxygen, in clearing out the debris that are resulting from the
link |
super high energy production of cognition in humans.
link |
So all of these things are basically potentially deconvolvable computationally, but experimentally,
link |
you can just do single cell profiling of dozens of regions of the brain across hundreds of
link |
individuals across millions of cells.
link |
And then now you have pieces of the puzzle that you can then put back together to understand
link |
I mean, first of all, the cells in the human brain are the most, maybe I'm romanticizing
link |
it, but cognition seems to be very complicated.
link |
So separating into the function, breaking Alzheimer's down to the cellular level seems
link |
Is that basically you're trying to find a way that some perturbation in the genome results
link |
in some obvious major dysfunction in the cell.
link |
You're trying to find something like that.
link |
So what does human genetics do?
link |
Human genetics basically looks at the whole path from genetic variation all the way to
link |
So human genetics has basically taken thousands of Alzheimer's cases and thousands of controls
link |
matched for age, for sex, for environmental backgrounds and so on and so forth.
link |
And then looked at that map where you're asking, what are the individual genetic perturbations
link |
and how are they related to all the way to Alzheimer's disease?
link |
And that has actually been quite successful.
link |
So we now have more than 27 different loci, these are genomic regions that are associated
link |
with Alzheimer's at these end to end level.
link |
But the moment you sort of break up that very long path into smaller levels, you can basically
link |
say from genetics, what are the epigenomic alterations at the level of gene regulatory
link |
elements where that genetic variant perturbs the control region nearby.
link |
That effect is much larger.
link |
You mean much larger in terms of this down the line impact or?
link |
It's much larger in terms of the measurable effect, this A versus B variance is actually
link |
so much cleanly defined when you go to the shorter branches.
link |
Because for one genetic variant to affect Alzheimer's, that's a very long path.
link |
That basically means that in the context of millions of these 6 million variants that
link |
every one of us carries, that one single nucleotide has a detectable effect all the way to the
link |
I mean, it's just mind boggling that that's even possible, but indeed there are such effects.
link |
So the hope is, or the most scientifically speaking, the most effective place where to
link |
detect the alteration that results in disease is earlier on in the pipeline, as early as
link |
If you go very early on in the pipeline, now each of these epigenomic alterations, for
link |
example, this enhancer control region is active maybe 50% less, which is a dramatic effect.
link |
Now you can ask, well, how much does changing one regulatory region in the genome in one
link |
cell type change disease?
link |
Well, that path is now long.
link |
So if you instead look at expression, the path between genetic variation and the expression
link |
of one gene goes through many enhancer regions, and therefore it's a subtler effect at the
link |
But then now you're closer because one gene is acting in the context of only 20,000 other
link |
genes as opposed to one enhancer acting in the context of 2 million other enhancers.
link |
So you basically now have genetic, epigenomic, the circuitry, transcriptomic, the gene expression
link |
control, and then cellular, where you can basically say, I can measure various properties
link |
What is the calcium influx rate when I have this genetic variation?
link |
What is the synaptic density?
link |
What is the electric impulse conductivity and so on and so forth?
link |
So you can measure things along this path to disease, and you can also measure endophenotypes.
link |
You can basically measure your brain activity.
link |
You can do imaging in the brain.
link |
You can basically measure, I don't know, the heart rate, the pulse, the lipids, the amount
link |
of blood secreted and so on and so forth.
link |
And then through all of that, you can basically get at the path to causality, the path to
link |
And is there something beyond cellular?
link |
So you mentioned lifestyle interventions or changes as a way to, or like be able to prescribe
link |
changes in lifestyle.
link |
Like what about organs?
link |
What about like the function of the body as a whole?
link |
So basically when you go to your doctor, they always measure, you know, your pulse.
link |
They always measure your height.
link |
They always measure your weight, you know, your BMI.
link |
So basically these are just very basic variables.
link |
But with digital devices nowadays, you can start measuring hundreds of variables for
link |
You can basically also phenotype cognitively through tests, Alzheimer's patients.
link |
There are cognitive tests that you can measure, that you typically do for cognitive decline,
link |
these mini mental observations that you have specific questions to.
link |
You can think of sort of enlarging the set of cognitive tests.
link |
So in the mouse, for example, you do experiments for how do they get out of mazes?
link |
How do they find food?
link |
Whether they recall a fear, whether they shake in a new environment and so on and so forth.
link |
In the human, you can have much, much richer phenotypes where you can basically say not
link |
just imaging at the organ level and all kinds of other activities at the organ level, but
link |
you can also do at the organism level, you can do behavioral tests.
link |
And how did they do on empathy?
link |
How did they do on memory?
link |
How did they do on longterm memory versus short term memory?
link |
And so on and so forth.
link |
I love how you're calling that phenotype.
link |
But like your behavior patterns that might change over a period of a life, your ability
link |
to remember things, your ability to be empathetic or emotionally, your intelligence perhaps
link |
Yeah, but intelligence has hundreds of variables.
link |
You can be your math intelligence, your literary intelligence, your puzzle solving intelligence,
link |
It could be like hundreds of things.
link |
And all of that, we're able to measure that better and better and all that could be connected
link |
to the entire pipeline somehow.
link |
We used to think of each of these as a single variable like intelligence.
link |
I mean, that's ridiculous.
link |
It's basically dozens of different genes that are controlling every single variable.
link |
You can basically think of, imagine us in a video game where every one of us has measures
link |
of strength, stamina, energy left and so on and so forth.
link |
But you could click on each of those five bars that are just the main bars and each
link |
of those will just give you then hundreds of bars and can basically say, okay, great
link |
for my machine learning task, I want someone who, a human who has these particular forms
link |
I require now these 20 different things.
link |
And then you can combine those things and then relate them to of course performance
link |
in a particular task, but you can also relate them to genetic variation that might be affecting
link |
different parts of the brain.
link |
For example, your frontal cortex versus your temporal cortex versus your visual cortex
link |
and so on and so forth.
link |
So genetic variation that affects expression of genes in different parts of your brain
link |
can basically affect your music ability, your auditory ability, your smell, just dozens
link |
of different phenotypes can be broken down into hundreds of cognitive variables and then
link |
relate each of those to thousands of genes that are associated with them.
link |
So somebody who loves RPGs or playing games, there's too few variables that we can control.
link |
So I'm excited if we're in fact living in a simulation and this is a video game, I'm
link |
excited by the quality of the video game.
link |
The game designer did a hell of a good job.
link |
So we're impressed.
link |
The sunset last night was a little unrealistic.
link |
To zoom back out, we've been talking about the genetic origins of diseases, but I think
link |
it's fascinating to talk about what are the most important diseases to understand and
link |
especially as it connects to the things that you're working on.
link |
So it's very difficult to think about important diseases to understand.
link |
There's many metrics of importance.
link |
One is lifestyle impact.
link |
I mean, if you look at COVID, the impact on lifestyle has been enormous.
link |
So understanding COVID is important because it has impacted the wellbeing in terms of
link |
ability to have a job, ability to have an apartment, ability to go to work, ability
link |
to have a mental circle of support and all of that for millions of Americans, like huge,
link |
So that's one aspect of importance.
link |
So basically mental disorders, Alzheimer's has a huge importance in the wellbeing of
link |
Whether or not it kills someone for many, many years, it has a huge impact.
link |
So the first measure of importance is just wellbeing.
link |
Impact on the quality of life.
link |
Impact on the quality of life, absolutely.
link |
The second metric, which is much easier to quantify is deaths.
link |
What is the number one killer?
link |
The number one killer is actually heart disease.
link |
It is actually killing 650,000 Americans per year.
link |
Number two is cancer with 600,000 Americans.
link |
Number three, far, far down the list is accidents, every single accident combined.
link |
So basically you read the news, accidents, like there was a huge car crash all over the
link |
But the number of deaths, number three by far, 167,000.
link |
Core respiratory disease.
link |
So that's asthma, not being able to breathe and so on and so forth, 160,000 Alzheimer's
link |
number five with 120,000 and then stroke, brain aneurysms and so on and so forth, that's
link |
147,000 diabetes and metabolic disorders, et cetera.
link |
The flu is 60,000, suicide, 50,000 and then overdose, et cetera, you know, goes further
link |
So of course COVID has creeped up to be the number three killer this year with, you know,
link |
more than 100,000 Americans and counting.
link |
And you know, but if you think about sort of what do we use, what are the most important
link |
diseases, you have to understand both the quality of life and the sheer number of deaths
link |
and just numbers of years lost if you wish.
link |
And each of these diseases you can think of as, and also including terrorist attacks and
link |
school shootings, for example, things which lead to fatalities, you can look at as problems
link |
that could be solved.
link |
And some problems are harder to solve than others.
link |
I mean, that's part of the equation.
link |
So maybe if you look at these diseases, if you look at heart disease or cancer or Alzheimer's
link |
or just like schizophrenia and obesity, Debbie, like not necessarily things that kill you,
link |
but affect the quality of life, which problems are solvable, which aren't, which are harder
link |
to solve, which aren't.
link |
I love your question because he puts it in the context of a global effort rather than
link |
just the local effort.
link |
So basically if you look at the global aspect, exercise and nutrition are two interventions
link |
that we can as a society make a much better job at.
link |
So if you think about sort of the availability of cheap food, it's extremely high in calories.
link |
It's extremely detrimental for you, like a lot of processed food, et cetera.
link |
So if we change that equation and as a society, we made availability of healthy food much,
link |
much easier and charged a burger at McDonald's, the price that it costs on the health system,
link |
then people would actually start buying more healthy foods.
link |
So basically that's sort of a societal intervention, if you wish.
link |
In the same way, increasing empathy, increasing education, increasing the social framework
link |
and support would basically lead to fewer suicides.
link |
It would lead to fewer murders.
link |
It would lead to fewer deaths overall.
link |
So that's something that we as a society can do.
link |
You can also think about external factors versus internal factors.
link |
So the external factors are basically communicable diseases like COVID, like the flu, et cetera.
link |
And the internal factors are basically things like cancer and Alzheimer's where basically
link |
your genetics will eventually drive you there.
link |
And then of course, with all of these factors, every single disease has both the genetic
link |
component and environmental component.
link |
So heart disease, huge genetic contribution, Alzheimer's, it's like 60% plus genetic.
link |
So I think it's like 79% heritability.
link |
So that basically means that genetics alone explains 79% of Alzheimer's incidents.
link |
And yes, there's a 21% environmental component where you could basically enrich your cognitive
link |
environment, enrich your social interactions, read more books, learn a foreign language,
link |
go running, you know, sort of have a more fulfilling life.
link |
All of that will actually decrease Alzheimer's, but there's a limit to how much that can impact
link |
because of the huge genetic footprint.
link |
So this is fascinating.
link |
So each one of these problems have a genetic component and an environment component.
link |
And so like when there's a genetic component, what can we do about some of these diseases?
link |
And have you worked on what can you say that's in terms of problems that are solvable here
link |
or understandable?
link |
So my group works on the genetic component, but I would argue that understanding the genetic
link |
component can have a huge impact even on the environmental component.
link |
Because genetics gives us access to mechanism.
link |
And if we can alter the mechanism, if we can impact the mechanism, we can perhaps counteract
link |
some of the environmental components.
link |
So understanding the biological mechanisms leading to disease is extremely important
link |
in being able to intervene.
link |
But when you can intervene and what, you know, the analogy that I like to give is for example,
link |
for obesity, you know, think of it as a giant bathtub of fat.
link |
There's basically fat coming in from your diet and there's fat coming out from your
link |
So that's an in out equation and that's the equation that everybody's focusing on.
link |
But your metabolism impacts that, you know, bathtub.
link |
Basically your metabolism controls the rate at which you're burning energy.
link |
It controls the rate at which you're storing energy.
link |
And it also teaches you about the various valves that control the input and the output
link |
So if we can learn from the genetics, the valves, we can then manipulate those valves.
link |
And even if the environment is feeding you a lot of fat and getting a little that out,
link |
you can just poke another hole at the bathtub and just get a lot of the fat out.
link |
Yeah, that's fascinating.
link |
So we're not just passive observers of our genetics.
link |
The more we understand, the more we can come up with actual treatments.
link |
And I think that's an important aspect to realize when people are thinking about strong
link |
effect versus weak effect variants.
link |
So some variants have strong effects.
link |
We talked about these Mendelian disorders where a single gene has a sufficiently large
link |
effect, penetrance, expressivity, and so on and so forth, that basically you can trace
link |
it in families with cases and not cases, cases, not cases, and so on and so forth.
link |
But so these are the genes that everybody says, oh, that's the genes we should go after
link |
because that's a strong effect gene.
link |
I like to think about it slightly differently.
link |
These are the genes where genetic impacts that have a strong effect were tolerated because
link |
every single time we have a genetic association with disease, it depends on two things.
link |
Number one, the obvious one, whether the gene has an impact on the disease.
link |
Number two, the more subtle one is whether there is genetic variation standing and circulating
link |
and segregating in the human population that impacts that gene.
link |
Some genes are so darn important that if you mess with them, even a tiny little amount,
link |
that person's dead.
link |
So those genes don't have variation.
link |
You're not going to find a genetic association if you don't have variation.
link |
That doesn't mean that the gene has no role.
link |
It simply means that the gene tolerates no mutations.
link |
So that's actually a strong signal when there's no variation.
link |
That's so fascinating.
link |
Genes that have very little variation are hugely important.
link |
You can actually rank the importance of genes based on how little variation they have.
link |
And those genes that have very little variation but no association with disease, that's a
link |
very good metric to say, oh, that's probably a developmental gene because we're not good
link |
at measuring those phenotypes.
link |
So it's genes that you can tell evolution has excluded mutations from, but yet we can't
link |
see them associated with anything that we can measure nowadays.
link |
It's probably early embryonic lethal.
link |
What are all the words you just said?
link |
Early embryonic what?
link |
Meaning that that embryo will die.
link |
There's a bunch of stuff that is required for a stable functional organism across the
link |
board for an entire species, I guess.
link |
If you look at sperm, it expresses thousands of proteins.
link |
Does sperm actually need thousands of proteins?
link |
No, but it's probably just testing them.
link |
So my speculation is that misfolding of these proteins is an early test for failure.
link |
So that out of the millions of sperm that are possible, you select the subset that are
link |
just not grossly misfolding thousands of proteins.
link |
So it's kind of an assert that this is folded correctly.
link |
This just because if this little thing about the folding of a protein isn't correct, that
link |
probably means somewhere down the line, there's a bigger issue.
link |
That's exactly right.
link |
So basically if you look at the mammalian investment in a newborn, that investment is
link |
enormous in terms of resources.
link |
So mammals have basically evolved mechanisms for fail fast.
link |
Where basically in those early months of development, I mean it's horrendous of course at the personal
link |
level when you lose your future child, but in some ways there's so little hope for that
link |
child to develop and sort of make it through the remaining months that sort of fail fast
link |
is probably a good evolutionary principle for mammals.
link |
And of course humans have a lot of medical resources that you can sort of give those
link |
children a chance and we have so much more success in sort of giving folks who have these
link |
strong carrier mutations a chance, but if they're not even making it through the first
link |
three months, we're not going to see them.
link |
So that's why when we say what are the most important genes to focus on, the ones that
link |
have a strong effect mutation or the ones that have a weak effect mutation, well the
link |
jury might be out because the ones that have a strong effect mutation are basically not
link |
mattering as much.
link |
The ones that only have weak effect mutations by understanding through genetics that they
link |
have a weak effect mutation and understanding that they have a causal role on the disease,
link |
we can then say, okay, great, evolution has only tolerated a 2% change in that gene.
link |
Pharmaceutically I can go in and induce a 70% change in that gene and maybe I will poke
link |
another hole at the bathtub that was not easy to control in many of the other sort of strong
link |
effect genetic variants.
link |
So there's this beautiful map of across the population of things that you're saying strong
link |
and weak effects, so stuff with a lot of mutations and stuff with little mutations with no mutations
link |
and you have this map and it lays out the puzzle.
link |
So when I say strong effect, I mean at the level of individual mutations.
link |
So basically genes where, so you have to think of first the effect of the gene on the disease.
link |
Remember how I was sort of painting that map earlier from genetics all the way to phenotype.
link |
That gene can have a strong effect on the disease, but the genetic variant might have
link |
a weak effect on the gene.
link |
So basically when you ask what is the effect of that genetic variant on the disease, it
link |
could be that that genetic variant impacts the gene by a lot and then the gene impacts
link |
the disease by a little, or it could be that the genetic variants impacts the gene by a
link |
little and then the gene impacts the disease by a lot.
link |
So what we care about is genes that impact the disease a lot, but genetics gives us the
link |
full equation and what I would argue is if we couple the genetics with expression variation
link |
to basically ask what genes change by a lot and which genes correlate with disease by
link |
a lot, even if the genetic variants change them by a little, then those are the best
link |
places to intervene.
link |
Those are the best places where pharmaceutical, if I have even a modest effect, I will have
link |
a strong effect on the disease, whereas those genetic variants that have a huge effect on
link |
the disease, I might not be able to change that gene by this much without affecting all
link |
kinds of other things.
link |
So that's what we're looking at.
link |
What have we been able to find in terms of which disease could be helped?
link |
Again, don't get me started.
link |
We have found so much.
link |
Our understanding of disease has changed so dramatically with genetics.
link |
I mean places that we had no idea would be involved.
link |
So one of the worst things about my genome is that I have a genetic predisposition to
link |
age related macular degeneration, AMD.
link |
So it's a form of blindness that causes you to lose the central part of your vision progressively
link |
as you grow older.
link |
My increased risk is fairly small.
link |
I have an 8% chance.
link |
You only have a 6% chance.
link |
By the way, when you say my, you mean literally yours.
link |
You know this about you.
link |
I know this about me.
link |
Which is kind of, I mean philosophically speaking is a pretty powerful thing to live with.
link |
Maybe that's, so we agreed to talk again by the way for the listeners to where we're going
link |
to try to focus on science today and a little bit of philosophy next time.
link |
But it's interesting to think about the more you're able to know about yourself from the
link |
genetic information in terms of the diseases, how that changes your own view of life.
link |
So there's a lot of impact there and there's something called genetics exceptionalism,
link |
which basically thinks of genetics as something very, very different than everything else
link |
as a type of determinism.
link |
And you know, let's talk about that next time.
link |
That's a good preview.
link |
So let's go back to AMD.
link |
So basically with AMD, we have no idea what causes AMD.
link |
You know, it was, it was a mystery until the genetics were worked out.
link |
And now the fact that I know that I have a predisposition allows me to sort of make some
link |
life choices, number one, but number two, the genes that lead to that predisposition
link |
give us insights as to how does it actually work.
link |
And that's a place where genetics gave us something totally unexpected.
link |
So there's a complement pathway, which is an immune function pathway that was in, you
link |
know, most of the loci associated with AMD.
link |
And that basically told us that, wow, there's an immune basis to this eye disorder that
link |
people had just not expected before.
link |
If you look at complement, it was recently also implicated in schizophrenia.
link |
And there's a type of microglia that is involved in synaptic pruning.
link |
So synapses are the connections between neurons.
link |
And in this whole use it or lose it view of mental cognition and other capabilities, you
link |
basically have microglia, which are immune cells that are sort of constantly traversing
link |
your brain and then pruning neuronal connections, pruning synaptic connections that are not
link |
So in schizophrenia, there's thought to be a change in the pruning that basically if
link |
you don't prune your synapses the right way, you will actually have an increased role of
link |
This is something that was completely unexpected for schizophrenia.
link |
Of course, we knew it has to do with neurons, but the role of the complement complex, which
link |
is also implicated in AMD, which is now also implicated in schizophrenia, was a huge surprise.
link |
What's the complement complex?
link |
So it's basically a set of genes, the complement genes that are basically having various immune
link |
And as I was saying earlier, our immune system has been coopted for many different roles
link |
So they actually play many diverse roles.
link |
And somehow the immune system is connected to the synaptic pruning process, the process.
link |
So the prune cells were coopted to prune synapses.
link |
How did you figure this out?
link |
How does one go about figuring this intricate connection, like pipeline of connections out?
link |
Let me give you another example.
link |
So Alzheimer's disease, the first place that you would expect it to act is obviously the
link |
So we had basically this roadmap epigenomics consortium view of the human epigenome, the
link |
largest map of the human epigenome that has ever been built across 127 different tissues
link |
and samples with dozens of epigenomic marks measured in hundreds of donors.
link |
So what we've basically learned through that is that you basically can map what are the
link |
active gene regulatory elements for every one of the tissues in the body.
link |
And then we connected these gene regulatory active maps of basically what regions of the
link |
human genome are turning on in every one of different tissues.
link |
We then can go back and say, where are all of the genetic loci that are associated with
link |
This is something that my group, I think was the first to do back in 2010 in this Ernst
link |
Nature Biotech paper, but basically we were for the first time able to show that specific
link |
chromatin states, specific epigenomic states, in that case enhancers, were in fact enriched
link |
in disease associated variants.
link |
We pushed that further in the Ernst Nature paper a year later.
link |
And then in this roadmap epigenomics paper a few years after that, but basically that
link |
matrix that you mentioned earlier was in fact the first time that we could see what genetic
link |
traits have genetic variants that are enriched in what tissues in the body.
link |
And a lot of that map made complete sense.
link |
If you looked at a diversity of immune traits like allergies and type one diabetes and so
link |
on and so forth, you basically could see that they were enriching, that the genetic variants
link |
associated with those traits were enriched in enhancers in these gene regulatory elements
link |
active in T cells and B cells and hematopoietic stem cells and so on and so forth.
link |
So that basically gave us a confirmation in many ways that those immune traits were indeed
link |
enriching immune cells.
link |
If you looked at type two diabetes, you basically saw an enrichment in only one type of sample
link |
and it was pancreatic islets.
link |
And we know that type two diabetes sort of stems from the dysregulation of insulin in
link |
the beta cells of pancreatic islets.
link |
And that sort of was spot on, super precise.
link |
If you looked at blood pressure, where would you expect blood pressure to occur?
link |
You know, I don't know, maybe in your metabolism and ways that you process coffee or something
link |
Maybe in your brain, the way that you stress out and increases your blood pressure, et
link |
So the blood pressure localized specifically in the left ventricle of the heart.
link |
So the enhancers of the left ventricle in the heart contained a lot of genetic variants
link |
associated with blood pressure.
link |
If you look at height, we found an enrichment specifically in embryonic stem cell enhancers.
link |
So the genetic variants predisposing you to be taller or shorter are in fact acting in
link |
developmental stem cells, makes complete sense.
link |
If you looked at inflammatory bowel disease, you basically found inflammatory, which is
link |
immune, and also bowel disease, which is digestive.
link |
And indeed we saw a double enrichment both in the immune cells and in the digestive cells.
link |
So that basically told us that this is acting in both components.
link |
There's an immune component to inflammatory bowel disease and there's a digestive component.
link |
And the big surprise was for Alzheimer's.
link |
We had seven different brain samples.
link |
We found zero enrichment in the brain samples for genetic variants associated with Alzheimer's.
link |
And this is mind boggling.
link |
Our brains were literally hurting.
link |
And what is going on is that the brain samples are primarily neurons, oligodendrocytes, and
link |
astrocytes in terms of the cell types that make them up.
link |
So that basically indicated that genetic variants associated with Alzheimer's were probably
link |
not acting in oligodendrocytes, astrocytes, or neurons.
link |
So what could they be acting in?
link |
Well, the fourth major cell type is actually microglia.
link |
Microglia are resident immune cells in your brain.
link |
They are CD14 plus, which is this sort of cell surface markers of those cells.
link |
So they're CD14 plus cells, just like macrophages that are circulating in your blood.
link |
The microglia are resident monocytes that are basically sitting in your brain.
link |
They're tissue specific monocytes.
link |
And every one of your tissues, like your fat, for example, has a lot of macrophages that
link |
And the M1 versus M2 macrophage ratio has a huge role to play in obesity.
link |
And so basically, again, these immune cells are everywhere, but basically what we found
link |
through this completely unbiased view of what are the tissues that likely underlie different
link |
disorders, we found that Alzheimer's was humongously enriched in microglia, but not at all in the
link |
So what are we supposed to make that if you look at the tissues involved, is that simply
link |
useful for indication of propensity for disease, or does it give us somehow a pathway of treatment?
link |
It's very much the second.
link |
If you look at the way to therapeutics, you have to start somewhere.
link |
What are you going to do?
link |
You're going to basically make assays that manipulate those genes and those pathways
link |
in those cell types.
link |
So before we know the tissue of action, we don't even know where to start.
link |
We basically are at a loss.
link |
But if you know the tissue of action, and even better, if you know the pathway of action,
link |
then you can basically screen your small molecules, not for the gene, you can screen them directly
link |
for the pathway in that cell type.
link |
So you can basically develop a high throughput multiplexed robotic system for testing the
link |
impact of your favorite molecules that you know are safe, efficacious, and sort of hit
link |
that particular gene and so on and so forth.
link |
You can basically screen those molecules against either a set of genes that act in that pathway
link |
or on the pathway directly by having a cellular assay.
link |
And then you can basically go into mice and do experiments and basically sort of figure
link |
out ways to manipulate these processes that allow you to then go back to humans and do
link |
a clinical trial that basically says, okay, I was able indeed to reverse these processes
link |
Can I do the same thing in humans?
link |
So the knowledge of the tissues gives you the pathway to treatment, but that's not the
link |
There are many additional steps to figuring out the mechanism of disease.
link |
So that's really promising.
link |
Maybe to take a small step back, you've mentioned all these puzzles that were figured out with
link |
the Nature paper for, I mean, you've mentioned a ton of diseases from obesity to Alzheimer's,
link |
even schizophrenia, I think you mentioned.
link |
What is the actual methodology of figuring this out?
link |
So indeed, I mentioned a lot of diseases and my lab works on a lot of different disorders.
link |
And the reason for that is that if you look at biology, it used to be zoology departments
link |
and botanology departments and virology departments and so on and so forth.
link |
And MIT was one of the first schools to basically create a biology department, like, oh, we're
link |
going to study all of life suddenly.
link |
Why was that even a case?
link |
Because the advent of DNA and the genome and the central dogma of DNA makes RNA makes protein
link |
in many ways, unified biology.
link |
You could suddenly study the process of transcription in viruses or in bacteria and have a huge
link |
impact on yeast and fly and maybe even mammals because of this realization of these common
link |
underlying processes.
link |
And in the same way that DNA unified biology, genetics is unifying disease studies.
link |
So you used to have, I don't know, cardiovascular disease department and neurological disease
link |
department and neurodegeneration department and basically immune and cancer and so on
link |
And all of these were studied in different labs because it made sense, because basically
link |
the first step was understanding how the tissue functions and we kind of knew the tissues
link |
involved in cardiovascular disease and so on and so forth.
link |
But what's happening with human genetics is that all of these walls and edifices that
link |
we had built are crumbling.
link |
And the reason for that is that genetics is in many ways revealing unexpected connections.
link |
So suddenly we now have to bring the immunologists to work on Alzheimer's.
link |
They were never in the room.
link |
They were in another building altogether.
link |
The same way for schizophrenia, we now have to sort of worry about all these interconnected
link |
For metabolic disorders, we're finding contributions from brain.
link |
So suddenly we have to call the neurologist from the other building and so on and so forth.
link |
So in my view, it makes no sense anymore to basically say, oh, I'm a geneticist studying
link |
I mean, that's ridiculous because, I mean, of course in many ways you still need to sort
link |
But what we're doing is that we're basically saying we'll go wherever the genetics takes
link |
And by building these massive resources, by working on our latest map is now 833 tissues,
link |
sort of the next generation of the epigenomics roadmap, which we're now called epimap, is
link |
833 different tissues.
link |
And using those, we've basically found enrichments in 540 different disorders.
link |
Those enrichments are not like, oh great, you guys work on that and we'll work on this.
link |
They're intertwined amazingly.
link |
So of course there's a lot of modularity, but there's these enhancers that are sort
link |
of broadly active and these disorders that are broadly active.
link |
So basically some enhancers are active in all tissues and some disorders are enriching
link |
So basically there's these multifactorial and this other class, which I like to call
link |
polyfactorial diseases, which are basically lighting up everywhere.
link |
And in many ways it's, you know, sort of cutting across these walls that were previously built
link |
across these departments.
link |
And the polyfactorial ones were probably the previous structural departments wasn't equipped
link |
to deal with those.
link |
I mean, again, maybe it's a romanticized question, but you know, there's in physics, there's
link |
a theory of everything.
link |
Do you think it's possible to move towards an almost theory of everything of disease
link |
from a genetic perspective?
link |
So if this unification continues, is it possible that, like, do you think in those terms, like
link |
trying to arrive at a fundamental understanding of how disease emerges, period?
link |
That unification is not just foreseeable, it's inevitable.
link |
I see it as inevitable.
link |
We have to go there.
link |
You cannot be a specialist anymore.
link |
If you're a genomicist, you have to be a specialist in every single disorder.
link |
And the reason for that is that the fundamental understanding of the circuitry of the human
link |
genome that you need to solve schizophrenia, that fundamental circuitry is hugely important
link |
to solve Alzheimer's.
link |
And that same circuitry is hugely important to solve metabolic disorders.
link |
And that same exact circuitry is hugely important for solving immune disorders and cancer and,
link |
you know, every single disease.
link |
So all of them have the same sub task.
link |
And I teach dynamic programming in my class.
link |
Dynamic programming is all about sort of not redoing the work.
link |
It's reusing the work that you do once.
link |
So basically for us to say, oh, great, you know, you guys in the immune building go solve
link |
the fundamental circuitry of everything.
link |
And then you guys in the schizophrenia building go solve the fundamental circuitry of everything
link |
separately, is crazy.
link |
So what we need to do is come together and sort of have a circuitry group, the circuitry
link |
building that sort of tries to solve the circuitry of everything.
link |
And then the immune folks who will apply this knowledge to all of the disorders that are
link |
associated with immune dysfunction and the schizophrenia folks will basically interacting
link |
with both the immune folks and with the neuronal folks.
link |
And all of them will be interacting with the circuitry folks and so on and so forth.
link |
So that's sort of the current structure of my group, if you wish.
link |
So basically what we're doing is focusing on the fundamental circuitry.
link |
But at the same time, we're the users of our own tools by collaborating with many other
link |
labs in every one of these disorders that we mentioned.
link |
We basically have a heart focus on cardiovascular disease, coronary artery disease, heart failure
link |
and so on and so forth.
link |
We have an immune focus on several immune disorders.
link |
We have a cancer focus on metastatic melanoma and immunotherapy response.
link |
We have a psychiatric disease focus on schizophrenia, autism, PTSD, and other psychiatric disorders.
link |
We have an Alzheimer's and neurodegeneration focus on Huntington's disease, ALS and, you
link |
know, AD related disorders like frontotemporal dementia and Lewy body dementia.
link |
And of course, a huge focus on Alzheimer's.
link |
We have a metabolic focus on the role of exercise and diets and sort of how they're impacting
link |
metabolic organs across the body and across many different tissues.
link |
And all of them are interfacing with the circuitry.
link |
And the reason for that is another computer science principle of eat your own dog food.
link |
If everybody ate their own dog food, dog food would taste a lot better.
link |
The reason why Microsoft Excel and Word and PowerPoint was so important and so successful
link |
is because the employees that were working on them, were using them for their day to
link |
You can't just simply build a circuitry and say, here it is guys, take the circuitry,
link |
we're done without being the users of that circuitry because you then go back.
link |
And because we span the whole spectrum from profiling the epigenome, using comparative
link |
genomics, finding the important nucleotides in the genome, building the basic functional
link |
map of what are the genes in the human genome, what are the gene regulatory elements of the
link |
I mean, over the years we've written a series of papers on how do you find human genes in
link |
the first place using comparative genomics?
link |
How do you find the motifs that are the building blocks of gene regulation using comparative
link |
And how do you then find how these motifs come together and act in specific tissues
link |
using epigenomics?
link |
How do you link regulators to enhancers and enhancers to their target genes using epigenomics
link |
and regulatory genomics?
link |
So through the years we've basically built all this infrastructure for understanding
link |
what I like to say, every single nucleotide of the human genome and how it acts in every
link |
one of the major cell types and tissues of the human body.
link |
I mean, this is no small task.
link |
This is an enormous task that takes the entire field.
link |
And that's something that my group has taken on along with many other groups.
link |
And we have also, and that sort of a thing sets my group perhaps apart, we have also
link |
worked with specialists in every one of these disorders to basically further our understanding
link |
all the way down to disease and in some cases collaborating with pharma to go all the way
link |
down to therapeutics because of our deep, deep understanding of that basic circuitry
link |
and how it allows us to now improve the circuitry.
link |
Not just treat it as a black box, but basically go and say, okay, we need a better cell type
link |
specific wiring that we now have at the tissue specific level.
link |
So we're focusing on that because we're understanding the needs from the disease front.
link |
So you have a sense of the entire pipeline, I mean, one, maybe you can indulge me.
link |
One nice question to ask would be, how do you, from the scientific perspective, go from
link |
knowing nothing about the disease to going, you said, to go into the entire pipeline and
link |
actually have a drug or a treatment that cures that disease?
link |
So that's an enormously long path and an enormously great challenge.
link |
And what I'm trying to argue is that it progresses in stages of understanding rather than one
link |
The traditional view of biology was you have one postdoc working on this gene and another
link |
postdoc working on that gene, and they'll just figure out everything about that gene
link |
and that's their job.
link |
But we've realized how polygenic the diseases are, so we can't have one postdoc per gene
link |
We now have to have these cross cutting needs.
link |
And I'm going to describe the path to circuitry along those needs.
link |
And every single one of these paths, we are now doing in parallel across thousands of
link |
So the first step is you have a genetic association, and we talked a little bit about sort of the
link |
Mendelian path and the polygenic path to that association.
link |
So the Mendelian path was looking through families to basically find gene regions and
link |
ultimately genes that are underlying particular disorders.
link |
The polygenic path is basically looking at unrelated individuals in this giant matrix
link |
of genotype by phenotype, and then finding hits where a particular variant impacts disease
link |
all the way to the end.
link |
And then we now have a connection, not between a gene and a disease, but between a genetic
link |
region and a disease.
link |
And that distinction is not understood by most people.
link |
So I'm going to explain it a little bit more.
link |
Why do we not have a connection between a gene and a disease, but we have a connection
link |
between a genetic region and a disease?
link |
The reason for that is that 93% of genetic variants that are associated with disease
link |
don't impact the protein at all.
link |
So if you look at the human genome, there's 20,000 genes, there's 3.2 billion nucleotides.
link |
Only 1.5% of the genome codes for proteins.
link |
The other 98.5% does not code for proteins.
link |
If you now look at where are the disease variants located, 93% of them fall in that outside
link |
the genes portion.
link |
Of course, genes are enriched, but they're only enriched by a factor of three.
link |
That means that still 93% of genetic variants fall outside the proteins.
link |
Why is that difficult?
link |
Why is that a problem?
link |
The problem is that when a variant falls outside the gene, you don't know what gene is impacted
link |
You can't just say, oh, it's near this gene, let's just connect that variant to the gene.
link |
And the reason for that is that the genome circuitry is very often long range.
link |
So you basically have that genetic variant that could sit in the intron of one gene.
link |
An intron is sort of the place between the exons that code for proteins.
link |
So proteins are split up into exons and introns and every exon codes for a particular subset
link |
of amino acids and together they're spliced together and then make the final protein.
link |
So that genetic variant might be sitting in an intron of a gene.
link |
It's transcribed with the gene, it's processed and then excised, but it might not impact
link |
It might actually impact another gene that's a million nucleotides away.
link |
So it's just riding along even though it has nothing to do with this nearby neighborhood.
link |
That's exactly right.
link |
Let me give you an example.
link |
The strongest genetic association with obesity was discovered in this FTO gene, fat and obesity
link |
So this FTO gene was studied ad nauseum.
link |
People did tons of experiments on it.
link |
They figured out that FTO is in fact RNA methylation transferase.
link |
It basically impacts something that we call the epitranscriptome.
link |
Just like the genome can be modified, the transcriptome, the transcript of the genes
link |
And we basically said, oh great, that means that epitranscriptomics is hugely involved
link |
in obesity because that gene FTO is clearly where the genetic locus is at.
link |
My group studied FTO in collaboration with a wonderful team led by Melina Klausnitzer.
link |
And what we found is that this FTO locus, even though it is as associated with obesity,
link |
does not implicate the FTO gene.
link |
The genetic variance, it's in the first intron of the FTO gene, but it controls two genes
link |
IRX3 and IRX5 that are sitting 1.2 million nucleotides away, several genes away.
link |
What am I supposed to feel about that because isn't that like super complicated then?
link |
So the way that I was introduced at a conference a few years ago was, and here's Manolis Kellis
link |
who wrote the most depressing paper of 2015.
link |
And the reason for that is that the entire pharmaceutical industry was so comfortable
link |
that there was a single gene in that locus.
link |
Because in some loci, you basically have three dozen genes that are all sitting in the same
link |
region of association and you're like, oh gosh, which ones of those is it?
link |
But even that question of which ones of those is it is making the assumption that it is
link |
one of those as opposed to some random gene just far, far away, which is what our paper
link |
So basically what our paper showed is that you can't ignore the circuitry.
link |
You have to first figure out the circuitry, all of those long range interactions, how
link |
every genetic variant impacts the expression of every gene in every tissue imaginable across
link |
hundreds of individuals.
link |
And then you now have one of the building blocks, not even all of the building blocks
link |
for then going and understanding disease.
link |
So embrace the wholeness of the circuitry.
link |
So back to the question of starting knowing nothing to the disease and going to the treatment.
link |
So what are the next steps?
link |
So you basically have to first figure out the tissue and then describe how you figure
link |
You figure out the tissue by taking all of these non coding variants that are sitting
link |
outside proteins and then figuring out what are the epigenomic enrichments.
link |
And the reason for that, you know, thankfully is that there is convergence, that the same
link |
processes are impacted in different ways by different loci.
link |
And that's a saving grace for our field.
link |
The fact that if I look at hundreds of genetic variants associated with Alzheimer's, they
link |
localize in a small number of processes.
link |
Can you clarify why that's hopeful?
link |
So like they show up in the same exact way in the, in the specific set of processes.
link |
So basically there's a small number of biological processes that underlie, or at least that
link |
play the biggest role in every disorder.
link |
So in Alzheimer's you basically have, you know, maybe 10 different types of processes.
link |
One of them is lipid metabolism.
link |
One of them is immune cell function.
link |
One of them is neuronal energetics.
link |
So these are just a small number of processes, but you have multiple lesions, multiple genetic
link |
perturbations that are associated with those processes.
link |
So if you look at schizophrenia, it's excitatory neuron function, it's inhibitory neuron function,
link |
it's synaptic pruning, it's calcium signaling and so on and so forth.
link |
So when you look at disease genetics, you have one hit here and one hit there and one
link |
hit there and one hit there, completely different parts of the genome.
link |
But it turns out all of those hits are calcium signaling proteins.
link |
That means that calcium signaling is important.
link |
So those people who are focusing on one doctor at a time cannot possibly see that picture.
link |
You have to become a genomicist.
link |
You have to look at the omics, the om, the holistic picture to understand these enrichments.
link |
But you mentioned the convergence thing.
link |
The whatever the thing associated with the disease shows up.
link |
So let me explain convergence.
link |
Convergence is such a beautiful concept.
link |
So you basically have these four genes that are converging on calcium signaling.
link |
So that basically means that they are acting each in their own way, but together in the
link |
But now in every one of these loci, you have many enhancers controlling each of those genes.
link |
That's another type of convergence where dysregulation of seven different enhancers might all converge
link |
on dysregulation of that one gene, which then converges on calcium signaling.
link |
And in each one of those enhancers, you might have multiple genetic variants distributed
link |
across many different people.
link |
Everyone has their own different mutation.
link |
But all of these mutations are impacting that enhancer.
link |
And all of these enhancers are impacting that gene.
link |
And all of these genes are impacting this pathway.
link |
And all these pathways are acting in the same tissue.
link |
And all of these tissues are converging together on the same biological process of schizophrenia.
link |
And you're saying the saving grace is that that conversion seems to happen for a lot
link |
of these diseases.
link |
Basically that for every single disease that we've looked at, we have found an epigenomic
link |
How do you do that?
link |
You basically have all of the genetic variants associated with the disorder.
link |
And then you're asking for all of the enhancers active in a particular tissue.
link |
For 540 disorders, we've basically found that indeed there is an enrichment.
link |
That basically means that there is commonality.
link |
And from the commonality, we can just get insights.
link |
So to explain in mathematical terms, we're basically building an empirical prior.
link |
We're using a Bayesian approach to basically say, great, all of these variants are equally
link |
likely in a particular locus to be important.
link |
So in a genetic locus, you basically have a dozen variants that are coinherited.
link |
Because the way that inheritance works in the human genome is through all of these recombination
link |
events during meiosis, you basically have, you know, you inherit maybe three, chromosome
link |
three, for example, in your body is inherited from four different parts.
link |
One part comes from your dad, another part comes from your mom, another part comes from
link |
your dad, another part comes from your mom.
link |
So basically, the way that it, sorry, from your mom's mom.
link |
So you basically have one copy that comes from your dad and one copy that comes from
link |
But that copy that you got from your mom is a mixture of her maternal and her paternal
link |
And the copy that you got from your dad is a mixture of his maternal and his paternal
link |
So these breakpoints that happen when chromosomes are lining up are basically ensuring through
link |
these crossover events, they're ensuring that every child cell during the process of meiosis,
link |
where you basically have, you know, one spermatozoid that basically couples with one ovule to basically
link |
create one egg to basically create the zygote.
link |
You basically have half of your genome that comes from dad and half your genome that comes
link |
But in order to line them up, you basically have these crossover events.
link |
These crossover events are basically leading to coinheritance of that entire block coming
link |
from your maternal grandmother and that entire block coming from your maternal grandfather.
link |
Over many generations, these crossover events don't happen randomly.
link |
There's a protein called PRDM9 that basically guides the double stranded breaks and then
link |
leads to these crossovers.
link |
And that protein has a particular preference to only a small number of hotspots of recombination,
link |
which then lead to a small number of breaks between these coinheritance patterns.
link |
So even though there are 6 million variants, there are 6 million loci, this variation is
link |
inherited in blocks and every one of these blocks has like two dozen genetic variants
link |
that are all associated.
link |
So in the case of FTO, it wasn't just one variant, it was 89 common variants that were
link |
all humongously associated with obesity.
link |
Which one of those is the important one?
link |
Well, if you look at only one locus, you have no idea.
link |
But if you look at many loci, you basically say, aha, all of them are enriching in the
link |
same epigenomic map.
link |
In that particular case, it was mesenchymal stem cells.
link |
So these are the progenitor cells that give rise to your brown fat and your white fat.
link |
Progenitor is like the early on developmental stem cells?
link |
So you start from one zygote and that's a totipotent cell type.
link |
It can do anything.
link |
You then, you know, that cell divides, divides, divides, and then every cell division is leading
link |
to specialization where you now have a mesodermal lineage and ectodermal lineage and endodermal
link |
lineage that basically leads to different parts of your body.
link |
The ectoderm will basically give rise to your skin, ecto means outside, derm is skin.
link |
So ectoderm, but it also gives rise to your neurons and your whole brain.
link |
So that's a lot of ectoderm.
link |
Mesoderm gives rise to your internal organs, including the vasculature and you know, your
link |
muscle and stuff like that.
link |
So you basically have this progressive differentiation and then if you look further, further down
link |
that lineage, you basically have one lineage that will give rise to both your muscle and
link |
your bone, but also your fat.
link |
And if you go further down the lineage of your fat, you basically have your white fat
link |
These are the cells that store energy.
link |
So when you eat a lot, but you don't exercise too much, there's an excess set of calories,
link |
What do you do with those?
link |
You basically create, you spend a lot of that energy to create these high energy molecules,
link |
lipids, which you can then burn when you need them on a rainy day.
link |
So that leads to obesity if you don't exercise and if you overeat because your body's like,
link |
oh great, I have all these calories.
link |
I'm going to store them.
link |
Ooh, more calories.
link |
I'm going to store them too.
link |
Ooh, more calories.
link |
So basically the 42% of European chromosomes have a predisposition to storing fat, which
link |
was selected probably in the food scarcity periods, like basically as we were exiting
link |
Africa before and during the ice ages, there was probably a selection to those individuals
link |
who made it North to basically be able to store energy, a lot more energy.
link |
So you basically now have this lineage that is deciding whether you want to store energy
link |
in your white fat or burn energy in your beige fat.
link |
It turns out that your fat is, you know, like we have such a bad view of fat.
link |
Fat is your best friend.
link |
Fat can both store all these excess lipids that would be otherwise circulating through
link |
your body and causing damage, but it can also burn calories directly.
link |
If you have too much energy, you can just choose to just burn some of that as heat.
link |
So basically when you're cold, you're burning energy to basically warm your body up and
link |
you're burning all these lipids and you're burning all these calories.
link |
So what we basically found is that across the board, genetic variants associated with
link |
obesity across many of these regions were all enriched repeatedly in mesenchymal stem
link |
So that gave us a hint as to which of these genetic variants was likely driving this whole
link |
And we ended up with this one genetic variant called RS1421085.
link |
And that genetic variant out of the 89 was the one that we predicted to be causal for
link |
So going back to those steps, first step is figure out the relevant tissue based on the
link |
global enrichment.
link |
Second step is figure out the causal variant among many variants in this linkage disequilibrium
link |
in this coinherited block between these recombination hotspots, these boundaries of these inherited
link |
That's the second step.
link |
The third step is once you know that causal variant, try to figure out what is the motif
link |
that is disrupted by that causal variant.
link |
Basically how does it act?
link |
Variants don't just disrupt elements, they disrupt the binding of specific regulators.
link |
So basically the third step there was how do you find the motif that is responsible
link |
like the gene regulatory word, the building block of gene regulation that is responsible
link |
for that dysregulatory event.
link |
And the fourth step is finding out what regulator normally binds that motif and is now no longer
link |
And then once you have the regulator, can you then try to figure out how to, what after
link |
it developed, how to fix it?
link |
That's exactly right.
link |
You now know how to intervene.
link |
You have basically a regulator, you have a gene that you can then perturb and you say,
link |
well, maybe that regulator has a global role in obesity.
link |
I can perturb the regulator.
link |
Just to clarify, when we say perturb, like on the scale of a human life, can a human
link |
I guess understanding is the first step.
link |
No, no, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics
link |
Or you develop other types of intervention that affect the expression of that gene.
link |
What do pharmaceutical therapeutics look like when your understanding is on a genetic level?
link |
Sorry if it's a dumb question.
link |
It's a brilliant question, but I want to save it for a little bit later when we start talking
link |
about therapeutics.
link |
So let's talk about the first four steps.
link |
So basically the first step is figure out, I mean, the zero step, the starting point
link |
The first step after that is figure out the tissue of action.
link |
The second step is figuring out the nucleotide that is responsible or set of nucleotides.
link |
The third step is figuring out the motif and the upstream regulator, number four.
link |
Number five and six is what are the targets?
link |
So number five is great.
link |
Now I know the regulator.
link |
I know the tissue and I know the variant.
link |
What does it actually do?
link |
So you have to now trace it to the biological process and the genes that mediate that biological
link |
So knowing all of this can now allow you to find the target genes.
link |
By basically doing perturbation experiments or by looking at the folding of the epigenome
link |
or by looking at the genetic impact of that genetic variant on the expression of genes.
link |
And we use all three.
link |
So let me go through them.
link |
Basically one of them is physical links.
link |
This is the folding of the genome onto itself.
link |
How do you even figure out the folding?
link |
It's a little bit of a tangent, but it's a super awesome technology.
link |
Think of the genome as again, this massive packaging that we talked about of taking two
link |
meters worth of DNA and putting it in something that's a million times smaller than two meters
link |
That's a single cell.
link |
You basically have this massive packaging and this packaging basically leads to the
link |
chromosome being wrapped around in sort of tight, tight ways in ways, however, that are
link |
functionally capable of being reopened and reclosed.
link |
So I can then go in and figure out that folding by sort of chopping up the spaghetti soup,
link |
putting glue and ligating the segments that were chopped up but nearby each other, and
link |
then sequencing through these ligation events to figure out that this region of this chromosome,
link |
that region of the chromosome were near each other.
link |
That means they were interacting even though they were far away on the genome itself.
link |
So that chopping up, sequencing and reglueing is basically giving you folds of the genome
link |
Sorry, can you backtrack?
link |
How does cutting it help you figure out which ones were close in the original folding?
link |
So you have a bowl of noodles.
link |
And in that bowl of noodles, some noodles are near each other.
link |
So you throw in a bunch of glue, you basically freeze the noodles in place, throw in a cutter
link |
that chops up the noodles into little pieces.
link |
Now throw in some ligation enzyme that lets those pieces that were free religate near
link |
In some cases, they religate what you had just cut, but that's very rare.
link |
Most of the time they will religate in whatever was proximal.
link |
You now have glued the red noodle that was crossing the blue noodle to each other.
link |
You then reverse the glue, the glue goes away and you just sequence the heck out of it.
link |
Most of the time you'll find red segment with, you know, red segment, but you can specifically
link |
select for ligation events that have happened that were not from the same segment by sort
link |
of marking them in a particular way and then selecting those and then you sequence and
link |
you look for red with blue matches of sort of things that were glued that were not immediate
link |
proximal to each other.
link |
And that reveals the linking of the blue noodle and the red noodle.
link |
You're with me so far?
link |
So we've done these experiments.
link |
That's the physical.
link |
That's the physical.
link |
That's step one of the physical.
link |
And what the physical revealed is topologically associated domains, basically big blocks of
link |
the genome that are topologically connected together.
link |
That's the physical.
link |
The second one is the genetic links.
link |
It basically says across individuals that have different genetic variants, how are their
link |
genes expressed differently?
link |
Remember before I was saying that the path between genetics and disease is enormous,
link |
but we can break it up to look at the path between genetics and gene expression.
link |
So instead of using Alzheimer's as a phenotype, I can now use expression of IRX3 as the phenotype,
link |
expression of gene A. And I can look at all of the humans who contain a G at that location
link |
and all the humans that contain a T at that location and basically say, wow, it turns
link |
out that the expression of each gene is higher for the T humans than for the G humans at
link |
So that basically gives me a genetic link between a genetic variant, a locus, a region,
link |
and the expression of nearby genes.
link |
Good on the genetic link?
link |
The third genetic link is the activity link.
link |
What's an activity link?
link |
It basically says if I look across 833 different epigenomes, whenever this enhancer is active,
link |
this gene is active.
link |
That gives me an activity link between this region of the DNA and that gene.
link |
And then the fourth one is perturbations where I can go in and blow up that region and see
link |
what are the genes that change in expression, or I can go in and over activate that region
link |
and see what genes change in expression.
link |
So I guess that's similar to activity?
link |
So that's basically similar to activity.
link |
I agree, but it's causal rather than correlational.
link |
Again, I'm a little weird.
link |
No, no, you're 100% on.
link |
It's exactly the same as the perturbation where I go in and intervene.
link |
I basically take a bunch of cells.
link |
So you know CRISPR, right?
link |
CRISPR is this genome guidance and cutting mechanism.
link |
That's what George Church likes to call genome vandalism.
link |
So you basically are able to, you can basically take a guide RNA that you put into the CRISPR
link |
system, and the CRISPR system will basically use this guide RNA, scan the genome, find
link |
wherever there's a match, and then cut the genome.
link |
So I digress, but it's a bacterial immune defense system.
link |
So basically bacteria are constantly attacked by viruses, but sometimes they win against
link |
the viruses and they chop up these viruses.
link |
And remember as a trophy inside their genome, they have these loci, these CRISPR loci that
link |
basically stands for clustered repeats, interspersed, et cetera.
link |
So basically it's an interspersed repeats structure where basically you have a set of
link |
repetitive regions and then interspersed where these variable segments that were basically
link |
So when this was first discovered, it was basically hypothesized that this is probably
link |
a bacterial immune system that remembers the trophies of the viruses that managed to kill.
link |
And then the bacteria pass on, you know, they sort of do lateral transfer of DNA and they
link |
pass on these memories so that the next bacterium says, Ooh, you killed that guy.
link |
When that guy shows up again, I will recognize him.
link |
And the CRISPR system was basically evolved as a bacterial adaptive immune response to
link |
sense foreigners that should not belong and to just go and cut their genome.
link |
So it's an RNA guided RNA cutting enzyme or an RNA guided DNA cutting enzyme.
link |
So there's different systems.
link |
Some of them cut DNA, some of them cut RNA, but all of them remember this sort of viral
link |
So what we have done now as a field is, you know, through the work of, you know, Jennifer
link |
Donne, Manuel Carpentier, Feng Zhang and many others is coopted that system of bacterial
link |
immune defense as a way to cut genomes.
link |
You basically have this guiding system that allows you to use an RNA guide to bring enzymes
link |
to cut DNA at a particular locus.
link |
That's so fascinating.
link |
So this is like already a natural mechanism, a natural tool for cutting those useful as
link |
particular context.
link |
And we're like, well, we can use that thing to actually, it's a nice tool that's already
link |
It's not in our body.
link |
It's in the bacterial body.
link |
It was discovered by the yogurt industry.
link |
They were trying to make better yogurts and they were trying to make their bacteria in
link |
their yogurt cultures more resilient to viruses.
link |
And they were studying bacteria and they found that, wow, this CRISPR system is awesome.
link |
It allows you to defend against that.
link |
And then it was coopted in mammalian systems that don't use anything like that as a targeting
link |
way to basically bring these DNA cutting enzymes to any locus in the genome.
link |
Why would you want to cut DNA to do anything?
link |
The reason is that our DNA has a DNA repair mechanism where if a region of the genome
link |
gets randomly cut, you will basically scan the genome for anything that matches and sort
link |
of use it by homology.
link |
So the reason why we're deployed is because we now have a spare copy.
link |
As soon as my mom's copy is deactivated, I can use my dad's copy.
link |
And somewhere else, if my dad's copy is deactivated, I can use my mom's copy to repair it.
link |
So this is called homologous based repair.
link |
So all you have to do is the cutting and you don't have to do the fixing.
link |
That's exactly right.
link |
You don't have to do the fixing.
link |
Because it's already built in.
link |
That's exactly right.
link |
But the fixing can be coopted by throwing in a bunch of homologous segments that instead
link |
of having your dad's version, have whatever other version you'd like to use.
link |
So you then control the fixing by throwing in a bunch of other stuff.
link |
That's exactly right.
link |
And that's how you do genome editing.
link |
So that's what CRISPR is.
link |
That's what CRISPR is.
link |
In popular culture, people use the term.
link |
I've never, wow, that's brilliant.
link |
So CRISPR is genome vandalism followed by a bunch of band aids that have the sequence
link |
And you could control the choices of band aids.
link |
And of course there's new generations of CRISPR.
link |
There's something that's called prime editing that was sort of very, very much in the press
link |
recently that basically instead of sort of making a double stranded break, which again
link |
is genome vandalism, you basically make a single stranded break.
link |
You basically just nick one of the two strands, enabling you to sort of peel off without sort
link |
of completely breaking it up and then repair it locally using a guide that is coupled to
link |
your initial RNA that took you to that location.
link |
Dumb question, but is CRISPR as awesome and cool as it sounds?
link |
I mean, technically speaking, in terms of like as a tool for manipulating our genetics
link |
in the positive meaning of the word manipulating, or is there downsides, drawbacks in this whole
link |
context of therapeutics that we're talking about or understanding and so on?
link |
So when I teach my students about CRISPR, I show them articles with the headline, genome
link |
editing tool revolutionizes biology.
link |
And then I show them the date of these articles and they're 2004, like five years before CRISPR
link |
And the reason is that they're not talking about CRISPR.
link |
They're talking about zinc finger enzymes that are another way to bring these cutters
link |
It's a very difficult way of sort of designing the right set of zinc finger proteins, the
link |
right set of amino acids that will now target a particular long stretch of DNA because for
link |
every location that you want to target, you need to design a particular regulator, a particular
link |
protein that will match that region well.
link |
There's another technology called talons, which are basically just a different way of
link |
using proteins to sort of guide these cutters to a particular location of the genome.
link |
These require a massive team of engineers, of biological engineers to basically design
link |
a set of amino acids that will target a particular sequence of your genome.
link |
The reason why CRISPR is amazingly, awesomely revolutionary is because instead of having
link |
this team of engineers design a new set of proteins for every locus that you want to
link |
target, you just type it in your computer and you just synthesize an RNA guide.
link |
The beauty of CRISPR is not the cutting, it's not the fixing.
link |
All of that was there before.
link |
It's the guiding, and the only thing that changes is that it makes the guiding easier
link |
by sort of just typing in the RNA sequence, which then allows the system to sort of scan
link |
the DNA to find that.
link |
So the coding, the engineering of the cutter is easier in terms of SP.
link |
That's kind of similar to the story of deep learning versus old school machine learning.
link |
Some of the challenging parts are automated.
link |
But CRISPR is just one cutting technology, and then that's part of the challenges and
link |
exciting opportunities of the field is to design different cutting technologies.
link |
So now this was a big parenthesis on CRISPR, but now when we were talking about perturbations,
link |
you basically now have the ability to not just look at correlation between enhancers
link |
and genes, but actually go and either destroy that enhancer and see if the gene changes
link |
in expression, or you can use the CRISPR targeting system to bring in not vandalism and cutting,
link |
but you can couple the CRISPR system with, and the CRISPR system is called usually CRISPR
link |
Cas9 because Cas9 is the protein that will then come and cut.
link |
But there's a version of that protein called dead Cas9 where the cutting part is deactivated.
link |
So you basically use the dead Cas9 to bring in an activator or to bring in a repressor.
link |
So you can now ask, is this enhancer changing that gene by taking this modified CRISPR,
link |
which is already modified from the bacteria to be used in humans, that you can now modify
link |
the Cas9 to be dead Cas9, and you can now further modify to bring in a regulator, and
link |
you can basically turn on or turn off that enhancer and then see what is the impact on
link |
So these are the four ways of linking the locus to the target gene, and that's step
link |
Step number five is find the target gene, and step number six is what the heck does
link |
You basically now go and manipulate that gene to basically see what are the processes that
link |
change, and you can basically ask, well, in this particular case, in the FTO locus, we
link |
found mesenchymal stem cells that are the progenitors of white fat and brown fat or
link |
We found the RS1421085 nucleotide variant as the causal variant.
link |
We found this large enhancer, this master regulator.
link |
I like to call it OB1 for obesity one, like the strongest enhancer associated with it,
link |
and OB1 was kind of chubby as the actor.
link |
I don't know if you remember him.
link |
So you basically are using this Jedi mind trick to basically find out the location of
link |
the genome that is responsible, the enhancer that harbors it, the motif, the upstream regulator,
link |
which is ARID5B for AT rich interacting domain 5B.
link |
That's a protein that sort of comes and binds normally.
link |
That protein is normally a repressor.
link |
It represses this super enhancer, this massive 12,000 nucleotide master regulatory control
link |
gene, and it turns off IRX3, which is a gene that's 600,000 nucleotides away, and IRX5,
link |
which is 1.2 million nucleotides away.
link |
And what's the effect of turning them off?
link |
That's exactly the next question.
link |
So step six is what do these genes actually do?
link |
So we then ask, what does RX3 and RX5 do?
link |
The first thing we did is look across individuals for individuals that had higher expression
link |
of RX3 or lower expression RX3.
link |
And then we looked at the expression of all of the other genes in the genome.
link |
And we looked for simply correlation.
link |
And we found that RX3 and RX5 were both correlated positively with lipid metabolism and negatively
link |
with mitochondrial biogenesis.
link |
You're like, what the heck does that mean?
link |
Does this sound related to obesity?
link |
Not at all superficially, but lipid metabolism should, because lipids is these high and
link |
energy molecules that basically store fat.
link |
So RX3 and RX5 are negatively correlated with lipid metabolism.
link |
So that basically means that when they turn on, positively, when they turn on, they turn
link |
on lipid metabolism.
link |
And they're negatively correlated with mitochondrial biogenesis.
link |
What do mitochondria do in this whole process?
link |
Again, small parenthesis, what are mitochondria?
link |
Mitochondria are little organelles.
link |
They arose, they only are found in eukaryotes.
link |
U means good, karyote means nucleus.
link |
So truly like a true nucleus.
link |
So eukaryotes have a nucleus.
link |
Prokaryotes are before the nucleus.
link |
They don't have a nucleus.
link |
So eukaryotes have a nucleus, compartmentalization.
link |
Eukaryotes have also organelles.
link |
Some eukaryotes have chloroplasts.
link |
These are the plants, they photosynthesize.
link |
Some other eukaryotes like us have another type of organelle called mitochondria.
link |
These arose from an ancient species that we engulfed.
link |
This is an endosymbiosis event.
link |
Symbiosis bio means life, sim means together.
link |
So symbiotes are things that live together.
link |
Symbiosis endo means inside, so endosymbiosis means you live together holding the other
link |
So the pre eukaryotes engulfed an organism that was very good at energy production and
link |
that organism eventually shed most of its genome to now have only 13 genes in the mitochondrial
link |
genome and those 13 genes are all involved in energy production, the electron transport
link |
So basically electrons are these massive super energy rich molecules.
link |
We basically have these organelles that produce energy and when your muscle exercises, you
link |
basically multiply your mitochondria.
link |
You basically sort of, you know, use more and more mitochondria and that's how you get
link |
So basically the muscle sort of learns how to generate more energy.
link |
So basically every single time your muscles will, you know, overnight regenerate and sort
link |
of become stronger and amplify their mitochondria and so forth.
link |
So what does mitochondria do?
link |
The mitochondria use energy to sort of do any kind of task.
link |
When you're thinking, you're using energy.
link |
This energy comes from mitochondria.
link |
Your neurons have mitochondria all over the place.
link |
Basically this mitochondria can multiply as organelles and they can be spread along the
link |
body of your muscle.
link |
Some of your muscle cells have actually multiple nuclei, they're polynucleated, but they also
link |
have multiple mitochondria to basically deal with the fact that your muscle is enormous.
link |
You can sort of span these super, super long length and you need energy throughout the
link |
length of your muscle.
link |
So that's why you have mitochondria throughout the length and you also need transcription
link |
through the length so you have multiple nuclei as well.
link |
So these two processes, lipids store energy, what do mitochondria do?
link |
So there's a process known as thermogenesis.
link |
Thermal heat, genesis generation.
link |
Thermogenesis is the generation of heat.
link |
Remember that bathtub with the in and out?
link |
That's the equation that everybody's focused on.
link |
So how much energy do you consume?
link |
How much energy do you burn?
link |
But in every thermodynamic system, there's three parts to the equation.
link |
There's energy in, energy out, and energy lost.
link |
Any machine has loss of energy.
link |
How do you lose energy?
link |
So heat is energy loss.
link |
Which is where the thermogenesis comes in.
link |
Thermogenesis is actually a regulatory process that modulates the third component of the
link |
thermodynamic equation.
link |
You can basically control thermogenesis explicitly.
link |
You can turn on and turn off thermogenesis.
link |
And that's where the mitochondria comes into play.
link |
So Irix3 and RX5 turn out to be the master regulators of a process of thermogenesis versus
link |
lipogenesis generation of fat.
link |
So Irix3 and RX5 in most people burn heat, burn calories as heat.
link |
So when you eat too much, just burn it off in your fat cells.
link |
So that bathtub has basically a sort of dissipation knob that most people are able to turn on.
link |
I am unable to turn that on because I am a homozygous carrier for the mutation that changes
link |
a T into a C in the RS1421085 allele and locus, a SNP.
link |
I have the risk allele twice from my mom and from my dad.
link |
So I'm unable to thermogenize.
link |
I'm unable to turn on thermogenesis through Irix3 and RX5 because the regulator that normally
link |
binds here, Irix5b, can no longer bind because it's an AT rich interacting domain.
link |
And as soon as I change the T into a C, it can no longer bind because it's no longer
link |
But doesn't that mean that you're able to use the energy more efficiently?
link |
You're not generating heat or is that?
link |
That means I can eat less and get around just fine.
link |
So that's a feature actually.
link |
It's a feature in a food scarce environment.
link |
But if we're all starving, I'm doing great.
link |
If we all have access to massive amounts of food, I'm obese basically.
link |
That's taken us to the entire process of then understanding that why mitochondria and then
link |
the lipids are both, even though distant, are somehow involved.
link |
Different sides of the same coin.
link |
And you basically choose to store energy or you can choose to burn energy.
link |
And then all of that is involved in the puzzle of obesity.
link |
And that's what's fascinating, right?
link |
Here we are in 2007, discovering the strongest genetic association with obesity and knowing
link |
nothing about how it works for almost 10 years.
link |
For 10 years, everybody focused on this FTO gene and they were like, oh, it must have
link |
to do something with RNA modification.
link |
And it's like, no, it has nothing to do with the function of FTO.
link |
It has everything to do with all of these other processes.
link |
And suddenly the moment you solve that puzzle, which is a multiyear effort by the way, a
link |
tremendous effort by Melina and many, many others.
link |
So this tremendous effort basically led us to recognize this circuitry.
link |
You went from having some 89 common variants associated in that region of the DNA sitting
link |
on top of this gene to knowing the whole circuitry.
link |
When you know the circuitry, you can now go crazy.
link |
You can now start intervening at every level.
link |
You can start intervening at the arid 5B level.
link |
You can start intervening with CRISPR Cas9 at the single SNP level.
link |
You can start intervening at iRx3 and iRx5 directly there.
link |
You can start intervening at the thermogenesis level because you know the pathway.
link |
You can start intervening at the differentiation level where the decision to make either white
link |
fat or beige fat, the energy burning beige fat is made developmentally in the first three
link |
days of differentiation of your adipocytes.
link |
So as they're differentiating, you basically can choose to make fat burning machines or
link |
fat storing machines.
link |
And sort of that's how you populate your fat.
link |
You basically can now go in pharmaceutical and do all of that.
link |
And in our paper, we actually did all of that.
link |
We went in and manipulated every single aspect.
link |
At the nucleotide level, we use CRISPR Cas9 genome editing to basically take primary adipocytes
link |
from risk and non risk individuals and show that by editing that one nucleotide out of
link |
3.2 billion nucleotides in the human genome, you could then flip between an obese phenotype
link |
and a lean phenotype like a switch.
link |
You can basically take my cells that are non thermogenizing and just flip into thermogenizing
link |
cells by changing one nucleotide.
link |
It's mind boggling.
link |
It's so inspiring that this puzzle could be solved in this way and it feels within reach
link |
to then be able to crack the problem of some of these diseases.
link |
What are the technologies, the tools that came along that made this possible?
link |
What are you excited about?
link |
Maybe if we just look at the buffet of things that you've kind of mentioned, what's involved?
link |
What should we be excited about?
link |
What are you excited about?
link |
I love that question because there's so much ahead of us.
link |
There's so, so much.
link |
So basically solving that one locus required massive amounts of knowledge that we have
link |
been building across the years through the epigenome, through the comparative genomics
link |
to find out the causal variant and the controller regulatory motif through the conserved circuitry.
link |
It required knowing these regulatory genomic wiring.
link |
It required high C of these sort of topologically associated domains to basically find these
link |
long range interaction.
link |
It required EQTLs of these sort of genetic perturbation of these intermediate gene phenotypes.
link |
It required all of the arsenal of tools that I've been describing was put together for
link |
And this was a massive team effort, huge investment in time, energy, money, effort, intellectual,
link |
You're referring to, I'm sorry, just for the obesity one.
link |
Yeah, this one paper.
link |
This one single paper.
link |
This one single locus.
link |
I would like to say that this is a paper about one nucleotide in the human genome, about
link |
one bit of information, C versus T in the human genome.
link |
That's one bit of information and we have 3.2 billion nucleotides to go through.
link |
So how do you do that systematically?
link |
I am so excited about the next phase of research because the technologies that my group and
link |
many other groups have developed allows us to now do this systematically, not just one
link |
locus at a time, but thousands of loci at a time.
link |
So let me describe some of these technologies.
link |
The first one is automation and robotics.
link |
So basically, you know, we talked about how you can take all of these molecules and see
link |
which of these molecules are targeting each of these genes and what do they do?
link |
So you can basically now screen through millions of molecules through thousands and thousands
link |
and thousands of plates, each of which has thousands and thousands and thousands of molecules,
link |
every single time testing, you know, all of these genes and asking which of these molecules
link |
perturb these genes.
link |
So that's technology number one, automation and robotics.
link |
Technology number two is parallel readouts.
link |
So instead of perturbing one locus and then asking if I use CRISPR Cas9 on this enhancer
link |
to basically use dCas9 to turn on or turn off the enhancer, or if I use CRISPR Cas9
link |
on the SNP to basically change that one SNP at a time, then what happens?
link |
But we have 120,000 disease associated SNPs that we want to test.
link |
We don't want to spend 120,000 years doing it.
link |
We've basically developed this technology for massively parallel reporter assays, MPRA.
link |
So in collaboration with Tarsha Mikkelsen, Eric Lander, I mean, Jason Durie's group has
link |
done a lot of that.
link |
So there's a lot of groups that basically have developed technologies for testing 10,000
link |
genetic variants at a time.
link |
How do you do that?
link |
You know, we talked about microarray technology, the ability to synthesize these huge microarrays
link |
that allow you to do all kinds of things like measure gene expression by hybridization,
link |
by measuring the genotype of a person, by looking at hybridization with one version
link |
with a T versus the other version with a C, and then sort of figuring out that I am a
link |
risk carrier for obesity based on these differential hybridization in my genome that says, oh,
link |
you seem to only have this allele or you seem to have that allele.
link |
These can also be used to systematically synthesize small fragments of DNA.
link |
So you can basically synthesize these 150 nucleotide long fragments across 450,000 spots
link |
You can now take the result of that synthesis, which basically works through all of these
link |
sort of layers of adding one nucleotide at a time.
link |
You can basically just type it into your computer and order it, and you can basically order
link |
10,000 or 100,000 of these small DNA segments at a time.
link |
And that's where awesome molecular biology comes in.
link |
You can basically take all these segments, have a common start and end barcode or sort
link |
of like Gator, just like pieces of a puzzle.
link |
You can make the same end piece and the same start piece for all of them.
link |
And you can now use plasmids, which are these extra chromosomal small DNA circular segments
link |
that are basically inhabiting all our, all our genomes.
link |
We basically have, you know, plasmids from floating around and bacteria use plasmids
link |
for transferring DNA.
link |
And that's where they put a lot of antibiotic resistance genes.
link |
So they can easily transfer them from one bacterium to the other.
link |
After one bacterium evolves a gene to be resistant to a particular antibiotic, it basically says
link |
to all its friends, Hey, here's that sort of DNA piece.
link |
We can now coopt these plasmids into human cells.
link |
You can basically make a human cell culture and add plasmids to that human cell culture
link |
that contain the things that you want to test.
link |
You now have this library of 450,000 elements.
link |
You can insert them each into the common plasmid and then test them in millions of cells in
link |
And the common plasmid is all the same before you add it.
link |
The rest of the plasmid is the same.
link |
So it's, it's called an epizomal reporter assay.
link |
Epizome means not inside the genome.
link |
It's sort of outside the chromosomes.
link |
So it's an epizomal assay that allows you to have a variable region where you basically
link |
test 10,000 different enhancers and you have a common region which basically has the same
link |
You now can do some very cool molecular biology.
link |
You can basically take the 450,000 elements that you've generated and you have a piece
link |
of the puzzle here, piece of the puzzle here, which is identical.
link |
So they're compatible with that plasmid.
link |
You can chop them up in the middle to separate a barcode reporter from the enhancer and in
link |
the middle put the same gene again using the same piece of the puzzle.
link |
You now can have a barcode readout of what is the impact of 10,000 different versions
link |
of an enhancer on gene expression.
link |
So we're not doing one experiment, we're doing 10,000 experiments.
link |
And those 10,000 can be 5,000 of different loci and each of them in two versions, risk
link |
I can now test tens of thousands.
link |
Just a little hypothesis.
link |
And then you can do 10,000 and we can test 10,000 hypothesis at once.
link |
How hard is it to generate those 10,000?
link |
Generating the 10,000 is trivial because you basically add, it's biotechnology.
link |
You basically have these arrays that add one nucleotide at a time at every spot.
link |
So it's printing and so you're able to, you're able to control.
link |
Is it super costly?
link |
So this isn't millions.
link |
10,000 bucks for 10,000 experiments sounds like the right, you know.
link |
I mean, so that's super, that's exciting because you don't have to do one thing at a time.
link |
You can now use that technology, these massively parallel reporter assays to test 10,000 locations
link |
We've made multiple modifications to that technology.
link |
One was sharper MPRA, which stands for, you know, basically getting a higher resolution
link |
view by tiling these, these elements so you can see where along the region of control
link |
And we made another modification called Hydra for high, you know, definition regulatory
link |
annotation or something like that, which basically allows you to test 7 million of these at a
link |
time by sort of cutting them directly from the DNA.
link |
So instead of synthesizing, which basically has the limit of 450,000 that you can synthesize
link |
at a time, we basically said, Hey, if we want to test all accessible regions of the genome,
link |
let's just do an experiment that cuts accessible regions.
link |
Let's take those accessible regions, put them all with the same end joints of the puzzles,
link |
and then now use those to create a much, much larger array of things that you can test.
link |
And then tiling all of these regions, you can then pinpoint what are the driver nucleotides,
link |
what are the elements, how are they acting across 7 million experiments at a time.
link |
So basically this is all the same family of technology where you're basically using these
link |
parallel readouts of the barcodes.
link |
And then to do this, we used a technology called StarSeq for self transcribing reporter
link |
assays, a technology developed by Alex Stark, my former postdoc, who's now API over in Vienna.
link |
So we basically coupled the StarSeq, the self transcribing reporters where the enhancer
link |
can be part of the gene itself.
link |
So instead of having a separate barcode, that enhancer basically acts to turn on the gene
link |
and it's transcribed as part of the gene.
link |
So you don't have to have the two separate parts.
link |
So you can just read them directly.
link |
So there's a constant improvements in this whole process.
link |
By the way, generating all these options, is it basically brute force?
link |
How much human intuition is?
link |
Oh gosh, of course it's human intuition and human creativity and incorporating all of
link |
the input data sets.
link |
Because again, the genome is enormous.
link |
3.2 billion, you don't want to test that.
link |
You basically use all of these tools that I've talked about already.
link |
You generate your top favorite 10,000 hypothesis, and then you go and test all 10,000.
link |
And then from what comes out, you can then go to the next step.
link |
So that's technology number two.
link |
So technology number one is robotics, automation, where you have thousands of wells and you
link |
constantly test them.
link |
The second technology is instead of having wells, you have these massively parallel readouts
link |
in sort of these pooled assays.
link |
The third technology is coupling CRISPR perturbations with these single cell RNA readouts.
link |
So let me make another parenthesis here to describe now single cell RNA sequencing.
link |
So what does single cell RNA sequencing mean?
link |
So RNA sequencing is what has been traditionally used, well, traditionally the last 20 years,
link |
ever since the advent of next generation sequencing.
link |
So basically before RNA expression profiling was based on these microarrays.
link |
The next technology after that was based on sequencing.
link |
So you chop up your RNA and you just sequence small molecules, just like you would sequence
link |
a genome, basically reverse transcribe the small RNAs into DNA, and you sequence that
link |
DNA in order to get the number of sequencing reads corresponding to the expression level
link |
of every gene in the genome.
link |
You now have RNA sequencing.
link |
How do you go to single cell RNA sequencing?
link |
That technology also went through stages of evolution.
link |
The first was microfluidics.
link |
You basically had these, or even chambers, you basically had these ways of isolating
link |
individual cells, putting them into a well for every one of these cells.
link |
So you have 384 well plates and you now do 384 parallel reactions to measure the expression
link |
That sounds amazing and it was amazing, but we want to do a million cells.
link |
How do you go from these wells to a million cells?
link |
So what the next technology was after that is instead of using a well for every reaction,
link |
you now use a lipid droplet for every reaction.
link |
So you use micro droplets as reaction chambers to basically amplify RNA.
link |
So here's the idea.
link |
You basically have microfluidics where you basically have every single cell coming down
link |
one tube in your microfluidics and you have little bubbles getting created in the other
link |
way with specific primers that mark every cell with its own barcode.
link |
You basically couple the two and you end up with little bubbles that have a cell and tons
link |
of markers for that cell.
link |
You now mark up all of the RNA for that one cell with the same exact barcode and you then
link |
lyse all of the droplets and you sequence the heck out of that and you have for every
link |
RNA molecule, a unique identifier that tells you what cell was it on.
link |
That is such good engineering, microfluidics and using some kind of primer to put a label
link |
I mean, you're making it sound easy.
link |
I assume it's beautiful, but it's gorgeous.
link |
So there's the next generation.
link |
So that's the second generation.
link |
Next generation is forget the microfluidics altogether.
link |
Just use big bottles.
link |
How can you possibly do that with big bottles?
link |
So here's the idea.
link |
You dissociate all of your cells or all of your nuclei from complex cells like brain
link |
cells that are very long and sticky so you can't do that.
link |
If you have blood cells or if you have neuronal nuclei or brain nuclei, you can basically
link |
dissociate let's say a million cells.
link |
You now want to add a unique barcode, a unique barcode in each one of a million cells using
link |
How can you possibly do that?
link |
Sounds crazy, but here's the idea.
link |
You use a hundred of these bottles, you randomly shuffle all your million cells and you throw
link |
them into those hundred bottles randomly, completely randomly.
link |
You add one barcode out of a hundred to every one of the cells.
link |
You then you now take them all out.
link |
You shuffle them again and you throw them again into the same hundred bottles.
link |
But now in a different randomization and you add a second barcode.
link |
So every cell now has two barcodes.
link |
You take them out again, you shuffle them and you throw them back in.
link |
Another third barcode is adding randomly from the same hundred barcodes.
link |
You've now labeled every cell probabilistically based on the unique path that he took of which
link |
of a hundred bottles did he go for the first time, which of a hundred bottles the second
link |
time and which of a hundred bottles the third time.
link |
A hundred times a hundred times a hundred is a million unique barcodes in every single
link |
one of these cells without ever using microfluidics.
link |
It's beautiful, right?
link |
From a computer science perspective, that's very clever.
link |
So you now have the single cell sequence technology.
link |
You can use the wells, you can use the bubbles or you can use the bottles and you have way
link |
The bubbles still sound pretty damn cool.
link |
The bubbles are awesome.
link |
And that's basically the main technology that we're using.
link |
So the bubbles is the main technology.
link |
So there are kits now that companies just sell to basically carry out single cell RNA
link |
sequencing that you can basically for $2,000, you can basically get 10,000 cells from one
link |
And for every one of those cells, you basically have the transcription of thousands of genes.
link |
And you know, of course the data for any one cell is noisy, but being computer scientists,
link |
we can aggregate the data from all of the cells together across thousands of individuals
link |
together to basically make very robust inferences.
link |
So the third technology is basically single cell RNA sequencing that allows you to now
link |
start asking not just what is the brain expression level difference of that genetic variant,
link |
but what is the expression difference of that one genetic variant across every single subtype
link |
How is the variance changing?
link |
You can't just, you know, with a brain sample, you can just ask about the mean, what is the
link |
average expression?
link |
If I instead have 3000 cells that are neurons, I can ask not just what is the neuronal expression.
link |
I can say for layer five excitatory neurons of which I have, I don't know, 300 cells,
link |
what is the variance that this genetic variant has?
link |
So suddenly it's amazingly more powerful.
link |
I can basically start asking about this middle layer of gene expression at unprecedented
link |
So when you look at the average, it washes out some potentially important signal that
link |
corresponds to ultimately the disease.
link |
So that, I can do that at the RNA level, but I can also do that at the DNA level for the
link |
So remember how before I was telling you about all this technology that we're using to probe
link |
the epigenome, one of them is DNA accessibility.
link |
So what we're doing in my lab is that from the same dissociation of say a brain sample
link |
where you now have all these tens of thousands of cells floating around, you basically take
link |
half of them to do RNA profiling and the other half to do epigenome profiling, both at the
link |
single cell level.
link |
So that allows you to now figure out what are the millions of DNA enhancers that are
link |
accessible in every one of tens of thousands of cells.
link |
And computationally, we can now take the RNA and the DNA readouts and group them together
link |
to basically figure out how is every enhancer related to every gene.
link |
And remember these sort of enhancer gene linking that we were doing across 833 samples?
link |
833 is awesome, don't get me wrong, but 10 million is way more awesome.
link |
So we can now look at correlated activity across 2.3 million enhancers and 20,000 genes
link |
in each of millions of cells to basically start piecing together the regulatory circuitry
link |
of every single type of neuron, every single type of astrocytes, oligodendrocytes, microglial
link |
cell inside the brains of 1,500 individuals that we sample across multiple different brain
link |
regions across both DNA and RNA.
link |
So that's the data set that my team generated last year alone.
link |
So in one year, we basically generated 10 million cells from human brain across a dozen
link |
different disorders, across schizophrenia, Alzheimer's, frontotemporal dementia, Lewy
link |
body dementia, ALS, Huntington's disease, post traumatic stress disorder, autism, bipolar
link |
disorder, healthy aging, et cetera.
link |
So it's possible that even just within that data set lie a lot of keys to understanding
link |
these diseases and then be able to like directly leads to then treatment.
link |
So basically we are now motivating.
link |
So our computational team is in heaven right now and we're looking for people.
link |
I mean, if you have super smart.
link |
So this is a very interesting kind of side question.
link |
How much of this is biology?
link |
How much of this is computation?
link |
So you're the head of the computational biology group, but how much of, should you be comfortable
link |
with biology to be able to solve some of these problems?
link |
If you just find, if you put several of the hats you were on fundamentally, are you thinking
link |
like a computer scientist here?
link |
This is the only way.
link |
As I said, we are the descendants of the first digital computer.
link |
We're trying to understand the digital computer.
link |
We're trying to understand the circuitry, the logic of this digital core computer and
link |
all of these analog layers surrounding it.
link |
So the case that I've been making is that you cannot think one gene at a time.
link |
The traditional biology is dead.
link |
There's no way you cannot solve disease with traditional biology.
link |
You need it as a component.
link |
Once you figured out RX3 and RX5, you now can then say, Hey, have you guys worked on
link |
those genes with your single gene approach?
link |
We'd love to know everything you know.
link |
And if you haven't, we now know how important these genes are.
link |
Let's now launch a single gene program to dissect them and understand them.
link |
But you cannot use that as a way to dissect disease.
link |
You have to think genomically.
link |
You have to think from the global perspective and you have to build these circuits systematically.
link |
So we need numbers of computer scientists who are interested and willing to dive into
link |
these data fully, fully in and extract meaning.
link |
We need computer science people who can understand machine learning and inference and decouple
link |
these matrices, come up with super smart ways of dissecting them.
link |
But we also need computer scientists who understand biology, who are able to design the next generation
link |
Because many of these experiments, no one in their right mind would design them without
link |
thinking of the analytical approach that you would use to deconvolve the data afterwards.
link |
Because it's massive amounts of ridiculously noisy data.
link |
And if you don't have the computational pipeline in your head before you even design the experiment,
link |
you would never design the experiment that way.
link |
So in designing the experiment, you have to see the entirety of the computational pipeline.
link |
That drives the design.
link |
That even drives the necessity for that design.
link |
Basically, you know, if you didn't have a computer scientist way of thinking, you would
link |
never design these hugely combinatorial, massively parallel experiments.
link |
So that's why you need interdisciplinary teams, you need teams.
link |
And I want to sort of clarify that what do we mean by computational biology group?
link |
The focus is not on computational, the focus is on the biology.
link |
So we are a biology group.
link |
What type of biology?
link |
Computational biology.
link |
That's the type of biology that uses the whole genome.
link |
That's the type of biology that designs experiments, genomic experiments, that can only be interpreted
link |
in the context of the whole genome.
link |
So it's philosophically looking at biology as a computer.
link |
So which is in the context of the history of biology is a big transformation.
link |
You can think of the name as what do we do?
link |
How do we study it?
link |
Only computationally.
link |
So all of these single cell sequencing can now be coupled with the technology that we
link |
talked about earlier for perturbation.
link |
So here's the crazy thing.
link |
Instead of using these wells and these robotic systems for doing one drug at a time or for
link |
perturbing one gene at a time in thousands of wells, you can now do this using a pool
link |
of cells and single cell RNA sequencing.
link |
You basically can take these perturbations using CRISPR and instead of using a single
link |
guide RNA, you can use a library of guide RNAs generated exactly the same way using
link |
this array technology.
link |
So you synthesize a thousand different guide RNAs.
link |
You now take each of these guide RNAs and you insert them in a pool of cells where every
link |
cell gets one perturbation.
link |
And you use CRISPR editing or CRISPR, so with either CRISPR Cas9 to edit a genome with these
link |
thousand perturbations or with the activation or with the repression.
link |
And you now can have a single cell readout where every single cell has received one of
link |
these modifications.
link |
And you can now in massively parallel ways, couple the perturbation and the readout in
link |
a single experiment.
link |
How are you tracking which perturbations each cell received?
link |
So there's ways of doing that, but basically one way is to make that perturbation an expressible
link |
vector so that part of your RNA reading is actually that perturbation itself.
link |
So you can basically put it in an expressible part so you can self drive it.
link |
So the point that I want to get across is that the sky's the limit.
link |
You basically have these tools, these building blocks of molecular biology.
link |
We have these massive data sets of computational biology.
link |
We have this huge ability to sort of use machine learning and statistical methods and, you
link |
know, linear algebra to sort of reduce the dimensionality of all these massive data sets.
link |
And then you end up with a series of actionable targets that you can then couple with pharma
link |
and just go after systematically.
link |
So the ability to sort of bring genetics to the epigenomics, to the transcriptomics, to
link |
the cellular readouts using these sort of high throughput perturbation technologies
link |
that I'm talking about and ultimately to the organismal through the electronic health record
link |
endophenotypes and ultimately the disease battery of assays at the cognitive level,
link |
at the physiological level and, you know, every other level.
link |
There is no better or more exciting field, in my view, to be a computer scientist then
link |
or to be a scientist in period.
link |
Basically this confluence of technologies, of computation, of data, of insight and of
link |
tools for manipulation is unprecedented in human history.
link |
And I think this is what's shaping the next century to really be a transformative century
link |
for our species and for our planet.
link |
Do you think the 21st century will be remembered for the big leaps in understanding and alleviation
link |
If you look at the path between discovery and therapeutics, it's been on the order of
link |
50 years, it's been shortened to 40, 30, 20, and now it's on the order of 10 years.
link |
But the huge number of technologies that are going on right now for discovery will result
link |
undoubtedly in the most dramatic manipulation of human biology that we've ever seen in the
link |
history of humanity in the next few years.
link |
Do you think we might be able to cure some of the diseases we started this conversation
link |
It's only a matter of time.
link |
Basically the complexity is enormous and I don't want to underestimate the complexity
link |
but the number of insights is unprecedented and the ability to manipulate is unprecedented
link |
and the ability to deliver these small molecules and other non traditional medicine perturbations,
link |
there's a new generation of perturbations that you can use at the DNA level, at the
link |
RNA level, at the micro RNA level, at the epigenomic level, there's a battery of new
link |
generations of perturbations.
link |
If you couple that with cell type identifiers that can basically sense when you are in the
link |
right cell based on the specific combination and then turn on that intervention for that
link |
cell, you can now think of combinatorial interventions where you can basically sort of feed a synthetic
link |
biology construct to someone that will basically do different things in different cells.
link |
So basically for cancer, this is one of the therapeutics that our collaborator Ron Weiss
link |
is using to basically start sort of engineering the circuits that will use micro RNA sensors
link |
of the environment to sort of know if you're in a tumor cell or if you're in an immune
link |
cell or if you're in a stromal cell and so forth and basically turn on particular interventions
link |
You can sort of create constructs that are tuned to only the liver cells or only the
link |
heart cells or only the brain cells and then have these new generations of therapeutics
link |
coupled with this immense amount of knowledge on the sort of which targets to choose and
link |
what biological processes to measure and how to intervene.
link |
My view is that disease is going to be fundamentally altered and alleviated as we go forward.
link |
Next time we talk, we'll talk about the philosophical implications of that and the effect of life,
link |
but let's stick to biology for just a little longer.
link |
We did pretty good today.
link |
We stuck to the science.
link |
What are you excited in terms of the future of this field, the technologies in your own
link |
group, in your own mind, you're leading the world at MIT in the science and the engineering
link |
So what are you excited about here?
link |
I could not be more excited.
link |
We are one of many, many teams who are working on this.
link |
In my team, the most exciting parts are, you know, many folds.
link |
So basically we've now assembled these battery of technologies.
link |
We've assembled these massive, massive data sets and now we're really sort of in the stage
link |
of our team's path of generating disease insights.
link |
So we are simultaneously working on a paper on schizophrenia right now that is basically
link |
using the single cell profiling technologies, using this editing and manipulation technologies
link |
to basically show how the master regulators underlying changes in the brain that are sort
link |
of found in schizophrenia are in fact affecting excitatory neurons and inhibitory neurons
link |
in pathways that are active both in synaptic pruning, but also in early development.
link |
We've basically found this set of four regulators that are connecting these two processes that
link |
were previously separate in schizophrenia in sort of having a sort of more unified view
link |
across those two sides.
link |
The second one is in the area of metabolism.
link |
We basically now have a beautiful collaboration with the Goodyear lab that's basically looking
link |
at multi tissue perturbations in six or seven different tissues across the body in the context
link |
of exercise and in the context of nutritional interventions using both mouse and human,
link |
where we can basically see what are the cell to cell communications that are changing across
link |
And what we're finding is this immense role of both immune cells as well as adipocyte
link |
stem cells in sort of reshaping that circuitry of all of these different tissues and that's
link |
sort of painting to a new path for therapeutical intervention there.
link |
In Alzheimer's, it's this huge focus on microglia and now we're discovering different classes
link |
of microglial cells that are basically either synaptic or immune.
link |
And these are playing vastly different roles in Alzheimer's versus in schizophrenia.
link |
And what we're finding is this immense complexity as you go further and further down of how
link |
in fact there's 10 different types of microglia, each with their own sort of expression programs.
link |
We used to think of them as, oh yeah, they're microglia, but in fact now we're realizing
link |
just even in that sort of least abundant of cell types, there's this incredible diversity
link |
The differences between brain regions is another sort of major, major insight.
link |
Often one would think that, oh, astrocytes are astrocytes no matter where they are.
link |
But no, there's incredible region specific differences in the expression patterns of
link |
all of the major brain cell types across different brain regions.
link |
So basically there's the neocortical regions that are sort of the recent innovation that
link |
makes us so different from all other species.
link |
There's the sort of reptilian brain sort of regions that are sort of much more very extremely
link |
There's the cerebellum.
link |
Each of those basically is associated in a different way with disease.
link |
And what we're doing now is looking into pseudo temporal models for how disease progresses
link |
across different regions of the brain.
link |
If you look at Alzheimer's, it basically starts in this small region called the entorhinal
link |
cortex and then it spreads through the brain and through the hippocampus and ultimately
link |
affecting the neocortex.
link |
And with every brain region that it hits, it basically has a different impact on the
link |
cognitive and memory aspects, orientation, short term memory, long term memory, et cetera,
link |
which is dramatically affecting the cognitive path that the individuals go through.
link |
So what we're doing now is creating these computational models for ordering the cells
link |
and the regions and the individuals according to their ability to predict Alzheimer's disease.
link |
So we can have a cell level predictor of pathology that allows us to now create a temporal time
link |
course that tells us when every gene turns on along this pathology progression and then
link |
trace that across regions and pathological measures that are region specific, but also
link |
cognitive measures and so on and so forth.
link |
So that allows us to now sort of for the first time, look at can we actually do early intervention
link |
for Alzheimer's where we know that the disease starts manifesting for 10 years before you
link |
actually have your first cognitive loss.
link |
Can we start seeing that path to build new diagnostics, new prognostics, new biomarkers
link |
for this sort of early intervention in Alzheimer's?
link |
The other aspect that we're looking at is mosaicism.
link |
We talked about the common variants and the rare variants, but in addition to those rare
link |
variants as your initial cell that forms the zygote divides and divides and divides, with
link |
every cell division there are additional mutations that are happening.
link |
So what you end up with is your brain being a mosaic of multiple different types of genetic
link |
Some cells contain a mutation that other cells don't have.
link |
So every human has the common variants that all of us carry to some degree, the rare variants
link |
that your immediate tree of the human species carries, and then there's the somatic variant,
link |
which is the tree that happened after the zygote that sort of forms your own body.
link |
So these somatic alterations is something that has been previously inaccessible to study
link |
in human postmortem samples.
link |
But right now with the advent of single cell RNA sequencing, in this particular case, we're
link |
using the well based sequencing, which is much more expensive, but gives you a lot richer
link |
information about each of those transcripts.
link |
So we're using now that richer information to infer mutations that have happened in each
link |
of the thousands of genes that sort of are active in these cells, and then understand
link |
how the genome relates to the function, this genotype phenotype relationship that we usually
link |
build in GWAS between in genome wide association studies between genetic variation and disease.
link |
We're now building that at the cell level, where for every cell, we can relate the unique
link |
specific genome of that cell with the expression patterns of that cell, and the predicted function
link |
using these predictive models that I mentioned before on this regulation for cognition for
link |
pathology in Alzheimer's at the cell level.
link |
And what we're finding is that the genes that are altered and the genetic regions that are
link |
altered in common variants versus rare variants versus somatic variants are actually very
link |
different from each other.
link |
The somatic variants are pointing to neuronal energetics and oligodendrocyte functions that
link |
are not visible in the genetic legions that you find for the common variants, probably
link |
because they have too strong of an effect that evolution is just not tolerating them
link |
on the common side of the allele frequency spectrum.
link |
So the somatic one, that's the variation that happens after the zygote, after you individual.
link |
I mean, this is a dumb question, but there's mutation and variation, I guess that happens
link |
And you're saying that they're through this, if we focus in on individual cells, we're
link |
able to detect the story that's interesting there, and that might be a very unique kind
link |
of important variability that arises for, you said neuronal or something that would
link |
Energetics, sounds like a cool term.
link |
So, I mean, the metabolism of humans is dramatically altered from that of nearby species.
link |
We talked about that last time that basically we are able to consume meat that is incredibly
link |
energy rich, and that allows us to sort of have functions that are meeting this humongous
link |
brain that we have.
link |
So basically on one hand, every one of our brain cells is much more energy efficient
link |
than our neighbors, than our relatives.
link |
Number two, we have way more of these cells.
link |
And number three, we have this new diet that allows us to now feed all these needs.
link |
That basically creates a massive amount of damage, oxidative damage from this huge super
link |
powered factory of ideas and thoughts that we carry in our skull.
link |
And that factory has energetic needs, and there's a lot of sort of biological processes
link |
underlying that, that we are finding are altered in the context of Alzheimer's disease.
link |
That's fascinating.
link |
So you have to consider all of these systems if you want to understand even something like
link |
diseases that you would maybe traditionally associate with just the particular cells of
link |
The immune system, the metabolic system, the metabolic system.
link |
And these are all the things that makes us uniquely human.
link |
So our immune system is dramatically different from that of our neighbors.
link |
Our societies are so much more clustered.
link |
The history of infection that have plagued the human population is dramatically different
link |
from every other species.
link |
The way that our society and our population has sort of exploded has basically put unique
link |
pressures on our immune system.
link |
And our immune system has both coped with that density and also been shaped by, as I
link |
mentioned, the vast amount of death that has happened in the Black Plague and other sort
link |
of selective events in human history, famines, ice ages, and so forth.
link |
So that's number one on the sort of immune side.
link |
On the metabolic side, again, we are able to sort of run marathons.
link |
I don't know if you remember the sort of human versus horse experiment where the horse actually
link |
tires out faster than the human and the human actually wins.
link |
So on the metabolic side, we're dramatically different.
link |
On the immune side, we're dramatically different.
link |
On the brain side, again, you know, no need to sort of, you know, it's a no brainer of
link |
how our brain is like just enormously more capable.
link |
And then, you know, in the side of cancer, so basically the cancers that humans are having,
link |
the exposures, the environmental exposures is again, dramatically different.
link |
And the lifespan, the expansion of human lifespan is unseen in any other species in, you know,
link |
recent evolutionary history.
link |
And that now leads to a lot of new disorders that are starting to, you know, manifest late
link |
So you know, Alzheimer's is one example where basically, you know, these vast energetic
link |
needs over a lifetime of thinking can basically lead to all of these debris and eventually
link |
saturate the system and lead to, you know, Alzheimer's in the late life.
link |
But there's, you know, there's just such a dramatic set of frontiers when it comes to
link |
aging research that, you know, so what I often like to say is that if you want to engineer
link |
a car to go from 70 miles an hour to 120 miles an hour, that's fine.
link |
You can basically, you know, fix a few components.
link |
If you wanted to now go at 400 miles an hour, you have to completely redesign the entire
link |
car because the system has just not evolved to go that far.
link |
Basically our human body has only evolved to live to, I don't know, 120, maybe we can
link |
get to 150 with minor changes.
link |
But if, you know, as we start pushing these frontiers for not just living, but well living,
link |
the Fzine that we talked about last time.
link |
So to basically push Fzine into the 80s and 90s and a hundreds and, you know, much further
link |
than that, we will face new challenges that have, you know, never been faced before in
link |
terms of cancer, the number of divisions, in terms of Alzheimer's and brain related
link |
disorders, in terms of metabolic disorders, in terms of regeneration, there's just so
link |
many different frontiers ahead of us.
link |
So I am thrilled about where we're heading.
link |
So basically I see this confluence in my lab and many other labs of AI, of, you know, sort
link |
of, you know, the next frontier of AI for drug design.
link |
So basically these sort of graph neural networks on specific chemical designs that allow you
link |
to create new generations of therapeutics.
link |
These molecular biology tricks for intervening at the system at every level, these personalized
link |
medicine prediction, diagnosis, and prognosis using the electronic health records and using
link |
these polygenic risk scores weighted by the burden, the number of mutations that are accumulating
link |
across common rare and somatic variants, the burden converging across all of these different
link |
molecular pathways, the delivery of specific drugs and specific interventions into specific
link |
And again, you've talked with Bob Langer about this, there's, you know, many giants in that
link |
And then the last concept is not intervening at the single gene level.
link |
I want you to sort of conceptualize the concept of an on target side effect.
link |
What is an on target side effect?
link |
An off target side effect is when you design a molecule to target one gene and instead
link |
it targets another gene and you have side effects because of that.
link |
And on target side effect is when your molecule does exactly what you were expecting, but
link |
that gene is plyotropic.
link |
Plyo means many, tropos means ways, many ways, it acts in many ways.
link |
It's a multifunctional gene.
link |
So you find that this gene plays a role in this, but as we talked about the wiring of
link |
genes to phenotypes is extremely dense and extremely complex.
link |
So the next stage of intervention will be intervening not at the gene level, but at
link |
the network level.
link |
Intervening at the set of pathways and the set of genes with multi input perturbations
link |
to the system, multi input modulations, pharmaceutical or other interventional, and that basically
link |
allow you to now work at the sort of full level of understanding, not just in your brain,
link |
but across your body, not just in one gene, but across the set of pathways and so on and
link |
so forth for every one of these disorders.
link |
So I think that we're finally at the level of systems medicine of basically instead of
link |
sort of medicine being at the single gene level, medicine being at the systems level
link |
where it can be personalized based on the specific set of genetic markers and genetic
link |
perturbations that you are either born with or that you have developed during your lifetime.
link |
Your unique set of exposures, your unique set of biomarkers, and your unique set of
link |
current set of conditions through your EHR and other ways.
link |
And the precision component of intervening extremely precisely in the specific pathways
link |
and the specific combinations of genes that should be modulated to sort of bring you from
link |
the disease state to the physiologically normal state or even to physiologically improved
link |
state through this combination of interventions.
link |
So that's in my view, the field where basically computer science comes together with artificial
link |
intelligence statistics, all of these other tools, molecular biology technologies and
link |
biotechnology and pharmaceutical technologies that are sort of revolutionary in the way
link |
And of course, this massive amount of molecular biology and data gathering and generation
link |
and perturbation in massively parallel ways.
link |
So there's no better way.
link |
There's no better time.
link |
There's no better place to be sort of looking at this whole confluence of ideas.
link |
And I'm just so thrilled to be a small part of this amazing, enormous ecosystem.
link |
It's exciting to imagine what humans of 100, 200 years from now, what their life experience
link |
is like, because these ideas seem to have potential to transform the quality of life
link |
that, when they look back at us, they probably wonder how we were put up with all the suffering
link |
Manolis, it's a huge honor.
link |
Thank you for spending this early Sunday morning with me.
link |
I deeply appreciate it.
link |
See you next time.
link |
Sounds like a plan.
link |
Thanks for listening to this conversation with Manolis Kellis.
link |
And thank you to our sponsors, SEMrush, which is an SEO optimization tool.
link |
Pessimist Archive, which is one of my favorite history podcasts.
link |
8Sleep, which is a self cooling mattress with smart sensors and an app.
link |
And finally, BetterHelp, which is an online therapy service.
link |
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
If you enjoy this thing, subscribe on YouTube, review it with 5 Stars and Apple Podcasts,
link |
follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
link |
And now, let me leave you with some words from Haruki Murakami.
link |
Human beings are ultimately nothing but carriers, passageways for genes.
link |
They ride us into the ground like racehorses from generation to generation.
link |
Genes don't think about what constitutes good or evil.
link |
They don't care whether we're happy or unhappy.
link |
We're just means to an end for them.
link |
The only thing they think about is what is most efficient for them.
link |
Thank you for listening, and hope to see you next time.