back to indexManolis Kellis: Biology of Disease | Lex Fridman Podcast #133
link |
The following is a conversation with Manolis Kellis, his third
link |
time on the podcast. He is a professor at MIT and head of the
link |
MIT Computational Biology Group. This time, we went deep on the
link |
science, biology and genetics. So this is a bit of an experiment.
link |
Manolis went back and forth between the basics of biology
link |
to the latest state of the art in the research. He's a master
link |
at this. So I just sat back and enjoyed the ride. This
link |
conversation happened at 7am. So it's yet another podcast
link |
episode after an all nighter for me. And once again, since the
link |
universe has a sense of humor, this one was a tough one for my
link |
brain to keep up. But I did my best and I never shy away from
link |
good challenge. Quick mention of a sponsor followed by some
link |
thoughts related to the episode. First is SEM Rush, the most
link |
advanced SEO optimization tool I've ever come across. I don't
link |
like looking at numbers. But someone probably should. It helps
link |
you make good decisions. Second is pessimist archive. They're
link |
back. One of my favorite history podcasts on why people resist
link |
new things from recorded music to umbrellas to cars, chess,
link |
coffee, and the elevator. Third is 8 Sleep, a mattress that
link |
cools itself, measures heart rate variability, has an app, and
link |
has given me yet another reason to look forward to sleep,
link |
including the all important power nap. And finally, better
link |
help online therapy when you want to face your demons with a
link |
licensed professional, not just by doing the David Goggins like
link |
physical challenges like I seem to do on occasion. Please
link |
check out the sponsors in the description to get a discount
link |
and to support this podcast. As a side note, let me say that
link |
biology in the brain and in the various systems of the body
link |
fill me with awe. Every time I think about how such a
link |
chaotic mess coming from its humble origins in the ocean
link |
was able to achieve such incredibly complex and robust
link |
mechanisms of life that survived despite all the forces of
link |
nature that want to destroy it. It is so unlike the computing
link |
systems we humans have engineered that it makes me feel
link |
that in order to create artificial general intelligence
link |
and artificial consciousness, we may have to completely
link |
rethink how we engineer computational systems. If you
link |
enjoy this thing, subscribe on YouTube, review it with
link |
Five Stars and Apple Podcasts, follow on Spotify, support on
link |
Patreon, or connect with me on Twitter at Lex Freedman. And
link |
now here's my conversation with Manolis Callis. So your group
link |
at MIT is trying to understand the molecular basis of human
link |
disease. What are some of the biggest challenges in your
link |
view? Don't get me started. I mean understanding human
link |
disease is the most complex challenge in modern science.
link |
So because human disease is as complex as the human genome,
link |
it is as complex as the human brain. And it is in many ways
link |
even more complex because the more we understand disease
link |
complexity, the more we start understanding genome
link |
complexity and epigenome complexity and brain
link |
circuitry complexity and immune system complexity and
link |
cancer complexity and so on and so forth. So traditionally,
link |
human disease was following basic biology. You would
link |
basically understand basic biology in model organisms
link |
like, you know, mouse and fly and yeast. You would
link |
understand sort of mammalian biology and animal
link |
biology and eukaryotic biology in sort of progressive layers
link |
of complexity getting closer to human phylogenetically.
link |
And you would do perturbation experiments in those species
link |
to see if I knock out a gene, what happens.
link |
And based on the knocking out of these genes, you would
link |
basically then have a way to drive human biology
link |
because you would you would sort of understand the functions of these
link |
genes and then if you find that a human gene
link |
locus, something that you've mapped from human genetics
link |
to that gene is related to a particular human disease, you
link |
say, aha, now I know the function of the gene from the model organisms.
link |
I can now go and understand the function of that gene
link |
in human. But this is all changing. This is
link |
dramatically changed. So that that was the old way of doing basic biology. You
link |
would start with the animal models, the eukaryotic models,
link |
the mammalian models, and then you would go to human.
link |
Human genetics has been so transformed in the last
link |
decade or two that human genetics is now actually
link |
driving the basic biology. There is more genetic mutation information
link |
in the human genome than there will ever be in any other species.
link |
What do you mean by mutation information? So so perturbations is how you
link |
understand systems. So an engineer builds systems
link |
and then they know how they work from the inside out. A scientist
link |
studies systems through perturbations. You basically say if I poke that balloon
link |
what's going to happen and I'm going to film it in super high resolution
link |
understand I don't know air dynamics or fluid dynamics if it's filled with water
link |
etc. So you can then make experimentation by
link |
perturbation and then the scientific process is sort of
link |
building models that best fit the data designing you
link |
experiments that best test your models and challenge your models and so on and so
link |
forth. That's the same thing with science. Basically if you're trying to
link |
understand biological science you basically want to do perturbations that
link |
then drive the models. So how do these perturbations
link |
allow you to understand disease? So if if you know that
link |
a gene is related to disease you don't want to just know that it's related to
link |
the disease. You want to know what is the disease mechanism because you want to
link |
go and intervene. So the way that I like to describe it is that
link |
traditionally epidemiology which is basically the study of disease
link |
you know sort of the observational study of disease
link |
has been about correlating one thing with another thing.
link |
So if you if you have a lot of people with liver disease who are also
link |
alcoholics you might say well maybe the alcoholism is
link |
driving the liver disease or maybe those who have liver disease
link |
self medicate with alcohol so that the connection could be either way.
link |
With genetic epidemiology it's about correlating changes in genome
link |
with phenotypic differences and then you know the direction of causality.
link |
So if you know that a particular gene is related to the disease
link |
you can basically say okay perturbing that gene in mouse
link |
causes the mice to have x phenotype. So perturbing that gene in human causes
link |
the humans to have the disease so I can now figure out what are the
link |
detailed molecular phenotypes in the human that are related to
link |
that organismal phenotype in the disease. So it's all
link |
about understanding disease mechanism, understanding what are the pathways,
link |
what are the tissues, what are the processes
link |
that are associated with the disease so that we know how to intervene.
link |
You can then prescribe particular medications that also alter these processes.
link |
You can prescribe lifestyle changes that also affect these processes
link |
and so on so forth. That's such a beautiful puzzle to try to solve
link |
like what kind of perturbations eventually have this ripple effect that
link |
leads to disease across the population and then you
link |
study that for animals, a mice first and then see how that might possibly
link |
connect to humans. How hard is that puzzle of
link |
trying to figure out how little perturbations might lead to
link |
in a stable way to a disease? In animals we make the puzzle
link |
simpler because we perturb one gene at a time.
link |
That's the beauty and that's the power of animal models. You can basically
link |
decouple the perturbations. You only do one
link |
perturbation and you only do strong perturbations at a time. In human the
link |
puzzle is incredibly complex because obviously you don't do
link |
human experimentation. You wait for natural selection
link |
and natural genetic variation to basically do its own experiments which it
link |
has been doing for hundreds and thousands of years
link |
in the human population and for hundreds of thousands of years
link |
across the history leading to the human population.
link |
You basically take this natural genetic variation
link |
that we all carry within us. Every one of us carries
link |
six million perturbations. I've done six million experiments on you,
link |
six million experiments on me, six million experiments on every one
link |
of seven billion people on the planet. What's the six million correspond to?
link |
Six million unique genetic variants that are segregating in the human
link |
population. Every one of us carries millions of
link |
polymorphic sites, poly, many morph forms. Polymorphic means many forms
link |
variants. That basically means that every one of us has
link |
single nucleotide alterations that we have inherited from mom and from dad
link |
that basically can be thought of as tiny little perturbations.
link |
Most of them don't do anything but some of them
link |
lead to all of the phenotypic differences that we see between us.
link |
The reason why two twins are identical is because these variants
link |
completely determine the way that I'm going to look at exactly 93 years of age.
link |
How happy are you with this kind of data set? Is it large enough
link |
of the human population of Earth? Is that too big, too small?
link |
Yeah, so is it large enough is a power analysis question?
link |
In every one of our grants, we do a power analysis based on what is the effect
link |
size that I would like to detect and what is the natural variation
link |
in the two forms. Every time you do a perturbation, you're asking I'm changing
link |
form A into form B. Form A has some natural
link |
phenotypic variation around it and form B has some natural phenotypic variation
link |
around it. If those variances are large and the
link |
differences between the mean of A and the mean of B are small,
link |
then you have very little power. The further the means go apart,
link |
that's the effect size, the more power you have,
link |
and the smaller the standard deviation, the more power you have.
link |
So basically when you're asking is that sufficiently large?
link |
Certainly not for everything but we already have enough power
link |
for many of the stronger effects in the more tight distributions.
link |
So that's a hopeful message that there exists
link |
parts of the genome that have a strong effect that has a small
link |
variance? That's exactly right. Unfortunately,
link |
those perturbations are the basis of disease in many cases.
link |
So it's not a hopeful message, sometimes it's a terrible message.
link |
It's basically well some people are sick but if we can figure out
link |
what are these contributors to sickness, we can then help make them better and
link |
help many other people better who don't carry that exact mutation
link |
but who carry mutations on the same pathways
link |
and that's what we like to call the allelic series
link |
of a gene. You basically have many perturbations
link |
of the same gene in different people each with a different
link |
frequency in the human population and each with a different effect
link |
on the individual that carries them. So you said in the past
link |
there would be these small experiments on perturbations and animal models.
link |
What does this puzzle solving process look like today?
link |
So we basically have you know something like seven billion people in the
link |
planet and every one of them carries something like six million
link |
mutations. You basically have an enormous
link |
matrix of genotype by phenotype by systematically measuring
link |
the phenotype of these individuals and the traditional way of measuring this
link |
phenotype has been to look at one trait at a time.
link |
You would gather families and you would sort of paint
link |
the pedigrees of a strong effect what we like to call
link |
Mendelian mutation. So a mutation that gets transmitted
link |
in a dominant or a recessive but strong effect form
link |
where basically one locus plays a very big role
link |
in that disease and you could then look at carriers versus non carriers
link |
in one family, carries versus non carries in another family
link |
and do that for hundreds sometimes thousands of families
link |
and then trace these inheritance patterns and then figure out what is the gene
link |
that plays that role. Is this the matrix that you're showing
link |
in talks or lectures? So that matrix is the input
link |
to the stuff that I show in talks. So basically that matrix has traditionally
link |
been strong effect genes. What the matrix looks like now
link |
is instead of pedigrees instead of families you basically have
link |
thousands and sometimes hundreds of thousands of unrelated individuals
link |
each with all of their genetic variants and each with their phenotype
link |
for example height or lipids or you know whether they're
link |
sick or not for a particular trait. That has been the modern view
link |
instead of going to families going to unrelated individuals
link |
with one phenotype at a time and what we're doing now
link |
as we're maturing in all of these sciences
link |
is that we're doing this in the context of large medical systems
link |
or enormous cohorts that are very well phenotyped across
link |
hundreds of phenotypes sometimes with their complete electronic health record.
link |
So you can now start relating not just one gene segregating one family
link |
not just thousands of variants segregating with one phenotype
link |
but now you can do millions of variants versus hundreds of phenotypes
link |
and as a computer scientist I mean deconvolving that matrix partitioning it
link |
into the layers of biology that are associated with every one of these
link |
elements is a dream come true it's it's like the world's greatest puzzle
link |
and you can now solve that puzzle by throwing in
link |
more and more knowledge about the function of different genomic regions
link |
and how these functions are changed across tissues
link |
and in the context of disease and that's what my group and many other groups
link |
are doing we're trying to systematically relate this genetic variation
link |
with molecular variation at the expression level of the genes
link |
at the epigenomic level of the gene regulatory circuitry
link |
and at the cellular level of what are the functions that are happening in those
link |
cells at the single cell level using single cell
link |
profiling and then relate all that vast amount of knowledge
link |
computationally with the thousands of traits
link |
that each of these of thousands of variants are perturbing.
link |
I mean this is something we talked about I think last time so there's these
link |
effects at different levels that happen you said at a single cell level
link |
you're trying to see things that happen due to certain perturbations
link |
and then so it's not just like a puzzle of
link |
perturbation and disease it's perturbation then
link |
effect at the cellular level then at an organ level
link |
that by like how do you disassemble this into like what your group is working on
link |
you're basically taking a bunch of the hard problems in the space
link |
how do you break apart a difficult disease
link |
and break it apart into problems that you into puzzles that you can now start
link |
solving. So there's a struggle here computer
link |
scientists love hard puzzles and they're like oh I want to
link |
you know build a method that just deconvolves the whole thing computationally
link |
and you know that's very tempting and it's very appealing
link |
but biologists just like to decouple that complexity experimentally
link |
to just like peel off layers of complexity experimentally
link |
and that's what many of these modern tools that you know my group and others
link |
have both developed and used the fact that we can now
link |
figure out tricks for peeling off these layers of complexity
link |
by testing one cell type at a time or by testing one cell at a time
link |
and you could basically say what is the effect of this genetic variant associated
link |
with Alzheimer's on human brain. Human brain
link |
sounds like oh it's an organ of course just go one organ at a time
link |
but human brain has of course dozens of different brain regions
link |
and within each of these brain regions dozens of different cell types
link |
and every single type of neuron every single type of glial cell
link |
between astrocytes oligodendrocytes microglia
link |
between you know all of the mural cells and the vascular cells and the immune
link |
cells that are coinhabiting the the brain between the
link |
different types of excitatory and inhibitory neurons
link |
that are sort of interacting with each other between different layers
link |
of neurons in the cortical layers every single one of these
link |
has a different type of function to play in cognition
link |
in interaction with the environment in maintenance
link |
of the brain in energetic needs in feeding the brain with blood with oxygen
link |
in clearing out the debris that are resulting from the super high energy
link |
production of cognition in humans so all of these things are basically
link |
potentially deconvolvable computationally but experimentally
link |
you can just do single cell profiling of dozens of regions of the brain across
link |
hundreds of individuals across millions of cells
link |
and then now you have pieces of the puzzle
link |
that you can then put back together to understand
link |
that complexity i mean first of all i mean the human brain the cells in the
link |
human brain are the most okay maybe i'm romanticizing it but cognition
link |
seems to be very complicated so separating into
link |
the function breaking Alzheimer's down to the cellular
link |
level seems very challenging is that basically you're trying to find
link |
a way that some perturbation in genome results in
link |
some obvious major dysfunction in the cell you're trying to find something
link |
like that exactly so so what does human
link |
genetics do human genetics basically looks at the whole path
link |
from genetic variation all the way to disease
link |
so human genetics has basically taken thousands of
link |
Alzheimer's cases and thousands of controls
link |
matched for age for sex for you know environmental
link |
backgrounds and so on so forth and then looked at that map
link |
where you're asking what are the individual genetic perturbations
link |
and how are they related to all the way to Alzheimer's disease
link |
and that has actually been quite successful so we now have
link |
you know more than 27 different loci these are genomic regions
link |
that are associated with Alzheimer's at this end to end level but the moment
link |
you sort of break up that very long path into smaller levels
link |
you can basically say from genetics what are the
link |
epigenomic alterations at the level of gene regulatory elements
link |
where that genetic variant perturbs the control region nearby
link |
that effect is much larger you mean much larger in terms of his
link |
down the line impact or it's much larger in terms of the
link |
measurable effect this a versus b variants is actually so much
link |
cleanly defined when you go to the shorter branches
link |
because for one genetic variant to affect Alzheimer's
link |
that's a very long path that basically means that in the context of millions of
link |
these six million variants that every one of us carries
link |
that one single nucleotide has a detectable effect
link |
all the way to the end I mean it's just mind boggling that that's even
link |
possible but indeed yeah but indeed there are such
link |
effects so the hope is or the most scientifically
link |
speaking the the most effective place where to detect
link |
the alteration that results in disease is earlier on in the pipeline
link |
as early as possible it's it's it's a trade off
link |
if you go very early on in the pipeline now each of these
link |
epigenomic alterations for example this enhancer control region
link |
is active maybe 50 less which is a dramatic effect
link |
now you can ask well how much just changing one regulatory region in the
link |
genome in one cell type change disease well that path is now
link |
long so if you instead look at expression
link |
the path between genetic variation and the expression of one gene goes through
link |
many enhancer regions and therefore it's a subtler effect at the gene
link |
level but then now you're closer because
link |
one gene is acting on you know in the context of only 20 000 other genes
link |
as opposed to one enhancer acting in the context of two million other enhancers
link |
so you basically now have genetic epigenomic the circuitry
link |
transcriptomic the gene expression level and then cellular where you can
link |
basically say I can measure various properties of those cells
link |
what is the calcium influx rate when I have these genetic variation
link |
what is the synaptic density what is the electric
link |
impulse conductivity and so on so forth so you can measure
link |
things along this path to disease and you can also measure
link |
endophenotypes you can basically measure you know
link |
your brain activity you can do imaging in the brain
link |
you can basically measure I don't know the heart rate the pulse the lipids
link |
the amount of blood secreted and so on so forth and then through all of that
link |
you can basically get at the path to causality the path to disease
link |
and is there something beyond cellular so you mentioned lifestyle
link |
interventions or changes as a way to uh or like be able to prescribe changes in
link |
lifestyle like what about organs what about like
link |
the function of the body as a whole yeah absolutely so basically
link |
when you go to your doctor they always measure you know your pulse they always
link |
measure your height those measure your weight
link |
your you know your bmi so basically these are just very basic variables
link |
but with digital devices nowadays you can start measuring hundreds of variables
link |
for every individual you can basically also phenotype
link |
cognitively through tests uh Alzheimer's patients
link |
there are cognitive tests that you can measure that you that you typically do
link |
for cognitive decline these mini mental you know observations that that you have
link |
specific questions too you can think of sort of enlarging the
link |
set of cognitive tests so in the mouse for example you do experiments for how
link |
do they get out of mazes how do they find food whether they recall a fear
link |
whether they shake in a new environment and so on so forth in the human you can
link |
have much much richer phenotypes where you can basically say
link |
not just imaging at the you know organ level but
link |
and in all kinds of other activities at the organ level but you can also do at
link |
the organism level you can do behavioral tests
link |
and how did they do on empathy how did they do on memory
link |
how did they do on long term memory versus short term memory
link |
and so on so forth i love how you're calling that phenotype
link |
i guess it is it is but like your behavior patterns that might change over
link |
over a period of a life yet your ability to remember
link |
things your ability to be yeah empathetic or emotionally
link |
your intelligence perhaps even yeah but intelligence has hundreds of variables
link |
you can be your math intelligence your literary intelligence your puzzle solving
link |
intelligence your logic it could be like hundreds of things and all that
link |
it's we were able to measure that better and better so all that could be
link |
connected to the entire pipeline so we used to think of each of these as a
link |
single variable like intelligence i mean that's ridiculous
link |
it's basically dozens of different genes that are controlling
link |
every single variable you can basically think of
link |
you know imagine us in a video game where every one of us has
link |
measures of you know strength stamina you know energy left and so on so forth
link |
but you could click on each of those like five bars that are just the main bars
link |
and each of those will just give you then hundreds of bars
link |
yeah and you basically say okay great for my you know machine learning task
link |
i want someone who i'm a human who has
link |
these particular forms of intelligence i require now these
link |
you know 20 different things and then you can combine those things
link |
and then relate them to of course performance in particular task
link |
but you can also relate them to genetic variation
link |
that might be affecting different parts of the brain
link |
for example your frontal cortex versus your temporal cortex versus your visual
link |
cortex and so on so forth so genetic variation that affects
link |
expression of genes in different parts of your brain
link |
can basically affect your you know music ability your auditory ability your
link |
smell your you know just dozens of different
link |
phenotypes can be broken down into you know hundreds of cognitive variables
link |
and then relate each of those to thousands of genes that are associated
link |
with them so somebody who loves RPGs role playing
link |
games there's uh there's too few variables that
link |
we can control so i'm excited if we're in fact living in a simulation
link |
this is a video game i'm excited by the quality of the video game
link |
the the the game designer did a hell of a
link |
good job so we're impressed so i don't know at the sunset last
link |
night was a little unrealistic yeah yeah the graphics
link |
exactly come on in video to zoom back out we've been talking about the
link |
genetic origins of diseases but i think it's fascinating to
link |
talk about what are the most important diseases to understand
link |
and especially as it connects to the things that you're working on
link |
so it's very difficult to think about important diseases to understand there's
link |
many metrics of importance one is lifestyle impact
link |
i mean if you look at covid the impact on lifestyle has been enormous
link |
so understanding covid is important because it has impacted
link |
the well being in terms of ability to have a job
link |
ability to have an apartment ability to go to work ability to have a mental
link |
circle of support and uh all of that for you know millions of
link |
americans like huge huge impact so that's one aspect of
link |
importance so basically mental disorders alzheimer's has a huge
link |
importance in the well being of americans
link |
whether or not it die it kills someone for many many years it has a huge impact
link |
so the first measure of importance is just well being
link |
like impact on the quality of life impact on the quality of life absolutely
link |
the second metric which is much easier to quantify is deaths
link |
what is the number one killer the number one killer
link |
is actually heart disease it is actually killing 650 thousand americans per
link |
year number two is cancer with 600
link |
thousand americans number three far far down the
link |
list is accidents every single accident combined so
link |
basically you you know you read the news accidents like you know there was a
link |
huge car crash all over the news but the number of deaths
link |
number three by far 167 thousand lower respiratory disease so that's asthma
link |
not being able to breathe and so on so forth 160 000
link |
alzheimer's number four number five with 120 000
link |
and then stroke brain aneurysms and so on so forth that's 147 000
link |
diabetes and metabolic disorders etc that's 85 000
link |
the flu is 60 000 suicide 50 000 and then overdose etc
link |
you know goes further down the list so of course covid has creeped up to be the
link |
number three killer this year with you know
link |
more than 100 000 americans and counting um and you know but but if you think
link |
about sort of what do we use what are the most important diseases you have to
link |
understand both the quality of life and the just sheer number of
link |
deaths and just numbers of years lost if you wish and and uh each of these
link |
diseases you can think of as uh and also including terrorist
link |
attacks and school shootings for example
link |
things which lead to fatalities you can look at
link |
as problems that could be solved and some problems are harder to solve than
link |
others i mean that's part of the equation so maybe if you
link |
look at these diseases if you look at heart disease or cancer
link |
or alzheimer's or just uh like schizophrenia and
link |
obesity debut like not necessarily things that kill you but
link |
affect the quality of life which problems are solvable
link |
which aren't which are harder to solve which aren't
link |
i love your question because you put it in the context of a global
link |
um effort rather than just a local effort so basically
link |
if you look at the global aspect exercise and nutrition are two
link |
interventions that we can as a society make a much better job at
link |
so if you think about sort of the availability of cheap food
link |
it's extremely high in calories it's extremely detrimental for you like a
link |
lot of processed food etc so if we change that equation
link |
and as a society we made availability of healthy food
link |
much much easier and charged a burger at mcdonald's the price that it costs
link |
on the health system then people would actually start buying more
link |
healthy uh foods so basically that's sort of a
link |
societal intervention if you wish in the same way
link |
increasing empathy increasing education increasing the social
link |
framework and support would basically lead to fewer suicides
link |
it would lead to fewer murders it would lead to fewer
link |
you know deaths overall um so you know that's something that we as a society
link |
can do you can you can also think about external factors versus
link |
internal factors so the external factors are basically communicable
link |
diseases like covid like the flu etc and the internal
link |
factors are basically things like you know cancer and Alzheimer's where
link |
basically your your genetics will eventually you know drive you there
link |
um and then of course with all of these
link |
factors every single disease has both a genetic component
link |
and environmental component so heart disease you know huge genetic
link |
contributor contribution alzheimer's it's like you know
link |
60 plus genetic so i think it's like 79 percent
link |
heritability so that basically means that genetics alone
link |
explains 79 percent of alzheimer's incidents and yes there's a 21
link |
percent environmental component where you could basically
link |
enrich your cognitive environment enrich your social interactions
link |
read more books learn a foreign language go running
link |
you know sort of have a more fulfilling life all of that will actually decrease
link |
alzheimer's but there's a limit to how much that that can
link |
impact because of the huge genetic footprint so this is fascinating so
link |
each one of these problems have a genetic component
link |
and an environment component and so like when there's a genetic component
link |
what can we do about some of these diseases what what have you worked on
link |
what can you say that's uh in terms of problems that are solvable here
link |
or understandable so my group works on the genetic component
link |
but i would argue that understanding the genetic component
link |
can have a huge impact even on the environmental component
link |
why is that because genetics gives us access to mechanism
link |
and if we can alter the mechanism if we can impact the mechanism
link |
we can perhaps counteract some of the environmental components all
link |
interesting so understanding the biological mechanisms leading to
link |
disease is extremely important in being able to
link |
intervene but when you can intervene what you know the analogy that i like to
link |
give is for example for obesity you know think
link |
of it as a giant bathtub of fat there's basically fat coming in from your diet
link |
and there's fat coming out from your exercise
link |
okay that's an in out equation and that's the equation that everybody's
link |
focusing on but your metabolism impacts that you know bathtub
link |
basically your metabolism controls the rate at which
link |
you're burning energy it controls the way the rate at which you're storing
link |
energy and it also teaches you about the various
link |
valves that control the input and the output equation
link |
so if we can learn from the genetics the valves
link |
we can then manipulate those valves and even if the environment is feeding you
link |
a lot of fat and getting a little that out
link |
you can just poke another hole at the bathtub and just get a lot of the fat
link |
out yeah that's fascinating yeah so that we're not just
link |
passive observers of our genetics the more we understand the more we can come
link |
up with actual treatments and i think that's an important
link |
aspect to realize when people are thinking about
link |
strong effect versus weak effect variants so some variants have strong
link |
effects we talked about these Mendelian disorders where a single
link |
gene has a sufficiently large effect pen and trans expressivity and so on so
link |
forth that basically you can trace it in families with
link |
cases and not cases cases not cases and so forth
link |
but even the you know but so these are the genes that everybody says oh
link |
that's the genes we should go after because that's a strong effect gene
link |
i like to think about it slightly differently these are the genes
link |
where genetic impacts that have a strong effect
link |
were tolerated because every single time we have a genetic
link |
association with disease it depends on two things
link |
number one the obvious one whether the gene has an impact on the disease
link |
number two the more subtle one is whether there is
link |
genetic variation standing and circulating and
link |
segregating in the human population that impacts that gene
link |
some genes are so darn important that if you mess with them
link |
even a tiny little amount that person is dead
link |
so those genes don't have variation you're not going to find a genetic
link |
association if you don't have variation that doesn't mean that the gene has no
link |
role it simply that the gene it simply means
link |
that the gene tolerates no mutations so that's actually a strong signal when
link |
there's no variation that's so fast exactly
link |
genes that have very little variation are hugely important you can actually
link |
rank the importance of genes based on how little variation they have
link |
and those genes that have very little variation but no association
link |
with disease that's a very good metric to say oh that's probably a developmental
link |
gene because we're not good at measuring those phenotypes
link |
so it's genes that you can tell evolution has excluded
link |
mutations from but yet we can't see them associated with anything that we can
link |
measure nowadays it's probably early embryonic lethal
link |
what are all the words you just said early embryonic what lethal
link |
meaning meaning that if you don't have that okay there's a bunch of stuff that
link |
is required for a stable functional organism
link |
across the board exactly our entire for for entire species I guess if you look
link |
at sperm it expresses thousands of proteins
link |
does sperm actually need thousands of proteins no
link |
but it's probably just testing them so my speculation
link |
is that misfolding of these proteins is an early
link |
test for failure so that out of the you know millions of sperm
link |
that are possible you select the subset that are just not grossly misfolding
link |
thousands of proteins so it's kind of an insert
link |
that this is folded correctly correct yeah this just
link |
because if this little thing about the folding of a protein isn't correct
link |
that probably means somewhere down the line there's a bigger issue
link |
that's exactly right so fail fast so basically if you look at the mammalian
link |
investment in a newborn that investment is
link |
enormous in terms of resources so mammals have basically evolved
link |
mechanisms for fail fast we're basically in those
link |
early months of development I mean it's it's horrendous of course at the
link |
personal level when you lose a you know your future child
link |
but in some ways there's so little hope for that child to develop
link |
and sort of make it through the remaining months that sort of fail fast is
link |
probably a good evolutionary principle from an evolutionary
link |
for mammals and of course humans have a lot of medical resources
link |
that you can sort of give those children a chance
link |
and you know we have so much more success instead of giving
link |
folks who have these strong carrier mutations a chance but if they're not
link |
even making it through the first three months we're not going to see them
link |
so that's why when we when we say what are the most important genes to focus on
link |
the ones that have a strong effect mutation or the ones that have a weak
link |
effect mutation well you know the jury might be out
link |
because the ones that have a strong effect mutation
link |
are basically you know not mattering that much
link |
the ones that only have weak effect mutations by understanding
link |
through genetics that they have a weak effect mutation
link |
and understanding that they have a causal role on the disease
link |
we can then say okay great evolution has only tolerated a two percent
link |
change in that gene pharmaceutically I can go in and induce a 70 percent
link |
change in that gene and maybe I will poke another
link |
hole at the bathtub that was not easy to control
link |
in you know many of the other sort of strong effect genetic variants
link |
so okay so there's this beautiful map of across the population of things that
link |
you're saying strong and weak effects so stuff with a lot of mutations and stuff
link |
with little mutations with no mutations any of this
link |
map and it lays out the puzzle yeah so so when I say strong effect I mean at
link |
the level of individual mutations so so basically genes where
link |
so so you have to think of first the effect of the gene on the disease
link |
remember how I was sort of painting that map earlier
link |
from genetics all the way to phenotype that gene
link |
can have a strong effect on the disease but the genetic variant
link |
might have a weak effect on the gene so basically
link |
when you ask what is the effect of that genetic variant
link |
on the disease it could be that that genetic variant impacts the gene by a
link |
lot and then the gene impacts the disease by
link |
a little or it could be that the genetic variants in
link |
impacts the gene by a little and then the gene impacts the disease by a lot
link |
so what we care about is genes that impact the disease a lot
link |
but genetics gives us the full equation and what I would argue
link |
is if we couple the genetics with expression variation
link |
to basically ask what genes change by a lot
link |
and you know which genes correlate with disease by a lot
link |
even if the genetic variants change them by a little
link |
then that those are the best places to intervene
link |
those are the best places where pharmaceutical if I have
link |
even a modest effect I will have a strong effect on the disease
link |
whereas those genetic variants that have a huge effect on the disease
link |
I might not be able to change that gene by this much without affecting all
link |
kinds of other things interesting so yeah okay so that's what
link |
we're looking at then what have we been able to find in terms of
link |
which disease could be helped again and don't get me started
link |
this is we have found so much our understanding of disease
link |
has changed so dramatically with genetics I mean places that we had no
link |
idea would be involved so one of the worst things about
link |
my genome is that I have a genetic predisposition to
link |
age related macular degeneration AMD so it's a form of blindness that causes
link |
you to to lose the central part of your vision
link |
progressively as you grow older my increased risk
link |
is fairly small I have an eight percent chance you only have a six percent
link |
chance you I'm an average yeah by the way when
link |
you say my you mean literally yours you know this about you
link |
I know this about me yeah which is kind of
link |
I mean philosophically speaking is a pretty powerful thing
link |
so to live with I mean maybe that's so we agreed to talk again by the way for
link |
the listeners to where we're going to try to focus on science today and a
link |
little bit of philosophy next time but it's interesting
link |
to think about the more you're able to know about yourself from the genetic
link |
information in terms of the diseases how that changes
link |
your own view of life yeah so there's there's a lot of impact there
link |
and there's a something called genetic exceptionalism
link |
which basically thinks of genetics as something very very different
link |
than everything else as a type of determinism and you know let's talk
link |
about that next time so basically a good preview yeah
link |
so let's go back to AMD so basically with AMD
link |
we have no idea what causes an AMD you know
link |
it was it was a mystery until the genetics were worked out
link |
and now the fact that I know that I have a predisposition
link |
allows me to sort of make some life choices number one
link |
but number two the genes that lead to that predisposition
link |
give us insights as to how does it actually work
link |
and that's a place where genetics gave us something totally unexpected
link |
so there's a complement pathway which is an immune function pathway
link |
that was in you know most of the loci associated with AMD
link |
and that basically told us that wow there's an immune basis
link |
to this eye disorder that people had just not expected before
link |
if you look at complement it was recently also
link |
implicated in schizophrenia and there's a type of microglia
link |
that is involved in synaptic pruning so synapses are the connections between
link |
neurons and in this whole use it or lose it view
link |
of mental cognition and other capabilities
link |
you basically have microglia which are immune cells that are sort of
link |
constantly traversing your brain and then pruning neuronal connections
link |
pruning synaptic connections that are not utilized
link |
so in schizophrenia there's thought to be a change in the pruning
link |
that basically if you don't prune your synapses the right way
link |
you will actually have an increased role of schizophrenia this is something that
link |
was completely unexpected for schizophrenia of course we knew it
link |
has to do with neurons but the role of the complement complex
link |
which is also implicated in AMD which is now also implicated in schizophrenia
link |
was a huge surprise what's the complement complex
link |
so it's basically a set of genes the complement genes
link |
that are basically having various immune roles
link |
and as I was saying earlier our immune system has been coopted
link |
for many different roles across the body so they actually play many diverse
link |
roles somehow the immune system is connected to the
link |
synaptic pruning process exactly so immune cells were coopted
link |
to prune synapse how did you figure this out
link |
how does one go off figuring this intricate connection
link |
like pipeline of connections out yeah let me give you another example
link |
so so Alzheimer's disease the first place that you would expect it to act is
link |
obviously the brain so we had basically this
link |
roadmap epigenomics consortium view of the human epigenome the largest map of
link |
the human epigenome that has ever been built
link |
across 127 different tissues and samples
link |
with dozens of epigenomic marks measured in you know
link |
hundreds of donors so what we've basically learned through that
link |
is that you you basically can map what are the active
link |
gene regulatory elements for every one of the tissues in the body
link |
and then we connected these gene regulatory
link |
active maps of basically what regions of the human genome
link |
are turning on in every one of different tissues
link |
we then can go back and say where are all of the genetic loci that are
link |
associated with disease this is something that my group
link |
I think was the first to do back in 2010
link |
in this Ernst Nature biotech paper but basically
link |
we were for the first time able to show that specific chromatin states specific
link |
epigenomic states in that case enhancers were in fact
link |
enriched enriched in disease associated variants
link |
we pushed that further in the Ernst Nature paper a year later
link |
and then in this roadmap epigenomics paper
link |
you know a few years after that but basically that matrix
link |
that you mentioned earlier was in fact the first time that we could see
link |
what genetic traits have genetic variants that are enriched
link |
in what tissues in the body and a lot of that map made complete
link |
sense if you looked at a diversity of immune
link |
traits like allergies and type 1 diabetes and so on so forth
link |
you basically could see that they were enriching
link |
that the genetic variants associated with those traits were enriched
link |
in enhancers in these gene regulatory elements
link |
active in T cells and B cells and hematopoietic stem cells and so on so forth
link |
so that basically gave us uh confirmation in many ways that those
link |
immune traits were indeed enriching immune cells
link |
if you look if you if you looked at type 2 diabetes
link |
you basically saw an enrichment in only one type of sample
link |
and it was pancreatic islets and we know that type 2 diabetes
link |
you know sort of stems from the dysregulation of insulin
link |
in the beta cells of pancreatic islets and that sort of was
link |
you know spot on super precise if you looked at blood pressure
link |
where would you expect blood pressure to occur
link |
you know I don't know maybe in your metabolism in ways that you process
link |
coffee or something like that maybe in your brain the way that you stress out
link |
and increases your blood pressure etc what we found is that blood pressure
link |
localized specifically in the left ventricle
link |
of the heart so the enhancers of the left rectum in the heart
link |
contained a lot of genetic variants associated with blood pressure
link |
if you look at height we found an enrichment specifically
link |
in embryonic stem cell enhancers so the genetic variants predisposing you to
link |
be taller or shorter are in fact acting in developmental stem cells
link |
makes complete sense if you looked at inflammatory bowel disease
link |
you basically found inflammatory which is immune
link |
and also bowel disease which is digestive and indeed we show a double
link |
enrichment both in the immune cells and in the digestive
link |
cells so that basically told us that I have this is acting in both components
link |
there's an immune component in commentary bowel disease and there's a
link |
digestive component and the big surprise was for Alzheimer's
link |
we had seven different brain samples we found
link |
zero enrichment in the brain samples for genetic variants associated with
link |
Alzheimer's and this is mind boggling
link |
our brains were literally hurting what is going on
link |
and what is going on is that the brain samples
link |
are primarily neurons oligodendrocytes
link |
and astrocytes in terms of the cell types that make them up
link |
so that basically indicated that genetic variants associated with
link |
Alzheimer's were probably not acting in
link |
oligodendrocytes astrocytes or neurons so what could they be acting in
link |
well the fourth major cell type is actually microglia
link |
microglia are resident immune cells in your brain
link |
oh nice the immune oh wow and they are cd14 plus which is this
link |
sort of cell surface markers of those cells so they're cd14 plus cells just
link |
like microphages that are circulating in your blood
link |
the microglia are resident monocytes
link |
that are basically sitting in your brain they're tissue specific
link |
monocytes and every one of your tissues like your your your fat for example
link |
has a lot of macrophages that are resin and the m1 versus m2 macrophage ratio
link |
has a huge role to play in obesity and you know so basically again these
link |
immune cells are everywhere but basically what we found
link |
through this completely unbiased view of what are the tissues that likely
link |
underlie different disorders we found that Alzheimer's
link |
was humongously enriched in microglia but not at all in the other cell types
link |
so what what are we supposed to make that if you look at the
link |
tissues involved is that simply useful for indication
link |
of propensity for disease or does it give us somehow a pathway of treatment
link |
it's very much the second if you look at the
link |
um the the way to therapeutics you have to start somewhere
link |
what are you going to do you're going to basically make assays
link |
that manipulate those genes and those pathways in those cell types so
link |
before we know the tissue of action we don't even know where to start
link |
we basically are at a loss but if you know the tissue of action
link |
and even better if you know the pathway of action
link |
then you can basically screen your small molecules
link |
not for the gene you can screen them directly for the pathway
link |
in that cell type so you can basically develop a high throughput multiplexed
link |
you know robotic system for testing the impact of your favorite molecules
link |
that you know are safe efficacious and you know sort of hit that particular
link |
gene and so and so forth you can basically screen those molecules
link |
against either a set of genes that act in that pathway
link |
or on the pathway directly by having a cellular assay
link |
and then you can basically go into mice and do experiments and basically sort
link |
of figure out ways to manipulate these processes
link |
that allow you to then to go back to humans and do a clinical trial that
link |
basically says okay i was able indeed to reverse these processes in mice
link |
can i do the same thing in humans so that the the knowledge of the tissues
link |
gives you the pathway to treatment but that's not the only part
link |
there are many additional steps to figuring out the mechanism of disease
link |
i mean so that's really promising maybe uh to take a small step back you've
link |
mentioned all these puzzles that were figured out with the nature paper
link |
for me you've mentioned a ton of diseases
link |
from obesity to Alzheimer's even schizophrenia i think you mentioned
link |
is just uh what is the actual methodology of figuring this out
link |
so indeed i mentioned a lot of diseases and and my lab works on a lot of
link |
different disorders and the reason for that
link |
is that if you look at the
link |
if you look at biology it used to be you know zoology departments and
link |
botanology departments and you know virology departments and so on and so
link |
forth and MIT was one of the first schools to basically create a biology
link |
department like oh we're gonna study all of life suddenly
link |
why was that even a case because the advent of DNA
link |
and the genome and the central dogma of DNA makes RNA makes protein
link |
in many ways unified biology you could suddenly
link |
study the process of transcription in viruses or in bacteria
link |
and have a huge impact on yeast and fly and maybe even mammals
link |
because of this realization of these common underlying processes
link |
and in the same way that DNA unified biology
link |
genetics is unifying disease studies so you used to have
link |
um you used to have uh you know i don't know um
link |
cardiovascular disease department and uh you know neurological disease
link |
department and you know the generation department and
link |
you know um basically immune and cancer and so on so forth
link |
and all of these were studied in different labs
link |
you know because it made sense because we basically the first step was
link |
understanding how the tissue functions and we kind of knew the tissues
link |
involved in cardiovascular disease and so on so forth
link |
but what's happening with human genetics is that all of that all of these
link |
walls and edifices that we had built are crumbling
link |
and the reason for that is that genetics
link |
is in many ways revealing unexpected connections
link |
so suddenly we now have to bring the immunologists
link |
to work on Alzheimer's they were never in their room
link |
they were in another building altogether the same way for schizophrenia
link |
we now have to sort of worry about all these interconnected
link |
aspects for metabolic disorders we're finding contributions from brain
link |
so suddenly we have to call the neurologists from the other building
link |
and so on so forth so in my view it makes no sense anymore
link |
to basically say oh i'm a geneticist studying immune disorders
link |
i mean that's that's ridiculous because i mean yeah of course in many ways
link |
you still need to sort of focus but what what what we're doing is that we're
link |
basically saying we'll go wherever the genetics takes us
link |
and by building these massive resources by working on
link |
our latest map is now 833 tissues sort of the the next generation of
link |
the epitomics roadmap which we're now called epimap
link |
is 833 different tissues and using those we've basically found
link |
enrichments in 540 different disorders those enrichments are not like oh
link |
great you guys work on that and we'll work on this
link |
they're intertwined amazingly so of course there's a lot of modularity
link |
but there's these enhancers that are sort of broadly active and these disorders
link |
that are broadly active so basically some enhancers are active in old tissues
link |
and some disorders are enriching in old tissues so basically there's these
link |
multifactorial and this other class which i like to call
link |
polyfactorial diseases which are basically lighting up everywhere
link |
and in many ways it's you know sort of cutting across these walls that were
link |
previously built across these departments and the polyfactorial ones were
link |
probably the previous structure departments wasn't
link |
equipped to deal with those i mean again maybe it's a romanticized
link |
question but you know there's in physics there's a theory of everything
link |
do you think it's possible to move towards an almost
link |
theory of everything of disease from a genetic perspective
link |
so if this unification continues is it possible that
link |
like do you think in those terms like trying to arrive
link |
at a fundamental understanding of how disease emerges period
link |
that unification is not just foreseeable
link |
it's inevitable i see it as inevitable
link |
we have to go there you cannot be a specialist
link |
anymore if you're a genomicist you have to be a specialist
link |
in every single disorder and the reason for that is that
link |
the fundamental understanding of the circuitry of the human genome
link |
that you need to solve schizophrenia that fundamental circuitry is hugely
link |
important to solve Alzheimer's and that same circuitry is hugely
link |
important to solve metabolic disorders and that same exact circuitry
link |
is hugely important for solving immune disorders and cancer
link |
and you know every single disease so all of them
link |
have the same sub task and i teach dynamic programming in my class
link |
dynamic program is all about sort of not redoing the work
link |
it's reusing the work that you do once so basically for us to say oh great
link |
you know you guys in the immune building go solve the fundamental circuitry of
link |
everything and then you guys in the schizophrenia building go
link |
solve the fundamental circuitry of everything separately
link |
is crazy so what we need to do is come together
link |
and sort of have a circuitry group the circuitry building that sort of tries
link |
to solve the circuitry of everything and then the immune folks
link |
who will apply this knowledge to all of the disorders that are
link |
associated with immune dysfunction and the schizophrenia folks
link |
will basically interacting with both the immune folks and with the neuronal
link |
folks and all of them will be interacting with the
link |
circuitry folks and so on so forth so that's sort of the current
link |
structure of my group if you wish so basically what we're doing is
link |
focusing on the fundamental circuitry but at the same time we're the users
link |
of our own tools by collaborating with
link |
many other labs in every one of these disorders that we mentioned
link |
we basically have a heart focus on cardiovascular disease coronary artery
link |
disease heart failure and so forth we have an immune focus on
link |
several immune disorders we have a cancer focus
link |
on uh metastatic melanoma and immunotherapy response
link |
we have a psychiatric disease focus on schizophrenia autism ptsd
link |
and other psychiatric disorders we have an Alzheimer's and neurodegeneration
link |
focus on hunting to disease ALS and you know
link |
AD related disorders like frontal temporal dementia and louis body
link |
dementia and of course a huge focus on Alzheimer's
link |
we have a metabolic focus on the role of exercise
link |
and diets and sort of how they're impacting metabolic
link |
you know organs across the body and across many different tissues
link |
and all of them are interfacing with the circuitry
link |
and the reason for that is another computer science principle
link |
of eat your own dog food if everybody ate their own dog food
link |
dog food would taste a lot better the reason why
link |
microsoft excel and word and powerpoint was so important
link |
and so successful is because the employees
link |
that were working on them were using them for their day to day tasks
link |
you can't just simply build a circuitry and say
link |
here it is guys take the circuitry we're done without being the
link |
users of that circuitry because you then go back
link |
and because we span the whole spectrum from profiling the epigenome
link |
using comparative genomics finding the important nucleotide in the genome
link |
building the basic functional map of what are the genes in the human genome
link |
what are the gene regulatory elements of the human genome
link |
i mean over the years we've written a series of papers on how do you find human
link |
genes in the first place using comparative genomics
link |
how do you find the motifs that are the building blocks of gene regulation
link |
using comparative genomics how do you then find how these motifs come together
link |
and act in specific tissues using epigenomics how do you link regulators
link |
to enhancers and enhancers to their target genes
link |
using epigenomics and regulatory genomics so
link |
through the years we've basically built all this infrastructure
link |
for understanding what i like to say every single nucleotide of the human genome
link |
and how it acts in every one of the major cell types and tissues of the human body
link |
i mean this is no small task this is an enormous task that takes the entire
link |
field and that's something that my group has taken on
link |
along with many other groups and we have also
link |
and that sort of a thing sets my group perhaps apart we have also worked
link |
with specialists in every one of these disorders
link |
to basically further our understanding all the way down to disease
link |
and in some cases collaborating with pharma to go all the way down to
link |
therapeutics because of our deep deep understanding
link |
of that basic circuitry and how it allows us to now improve the
link |
circuitry not just treat it as a black box but
link |
basically go and say okay we need a better cell type specific wiring
link |
that we now have at the tissue specific level so we're focusing on that
link |
because we're understanding you know the needs from the disease front
link |
so you have a sense of the entire pipeline i mean one
link |
maybe you can indulge me one nice question to ask would be
link |
how do you from the scientific perspective go from knowing nothing
link |
about the disease to going you said to go into the
link |
entire pipeline and actually have a drug or or a treatment that cures that
link |
disease so that's an enormously long path
link |
and an enormously great challenge and what i'm trying to argue is that
link |
it progresses in stages of understanding
link |
rather than one gene at a time the traditional view of biology was
link |
you have one postdoc working on this gene and another postdoc working on that
link |
gene and they'll just figure out everything about
link |
that gene and that's their job what we've realized is how
link |
polygenic the diseases are so we can't have one postdoc per
link |
gene anymore we now have to have these cross cutting
link |
needs and i'm going to describe the path
link |
to circuitry along those needs and every single one of these paths
link |
we are now doing in parallel across thousands of genes
link |
so the first step is you have a genetic association
link |
and we talked a little bit about sort of the Mendelian path
link |
and the polygenic path to that association so the Mendelian path was
link |
looking through families to basically find gene regions
link |
and ultimately genes that are underlying particular disorders
link |
the polygenic path is basically looking at
link |
unrelated individuals in this giant matrix of genotype
link |
by phenotype and then finding hits where a particular variant
link |
impacts disease all the way to the end and then we now have a connection not
link |
between a gene and a disease but between a genetic region and a
link |
disease and that distinction is not understood by most people
link |
so i'm going to explain it a little bit more
link |
why do we not have a connection between a gene
link |
and a disease but we have a connection between a genetic region and a disease
link |
the reason for that is that 93 percent of genetic variants
link |
that are associated with disease don't impact the protein at all
link |
so if you look at the human genome there's 20 000 genes
link |
there's 3.2 billion nucleotides only 1.5 percent of the genome
link |
codes for proteins the other 98.5 percent does not code for proteins
link |
if you now look at where are the disease variants located
link |
93 percent of them fall in that outside the gene's portion
link |
of course genes are enriched but they're only enriched by a factor of three
link |
that means that still 93 percent of genetic variants
link |
fall outside the proteins why is that difficult why is that a problem
link |
the problem is that when a variant falls outside the gene
link |
you don't know what gene is impacted by that variant you can't just say oh
link |
it's near this gene let's just connect that variant to the gene
link |
and the reason for that is that the genome circuitry
link |
is very often long range so you basically have that genetic variant
link |
that could sit in the intron of one gene and an intron is sort of the
link |
the place between the exons that code for proteins so proteins are split up
link |
into exons and introns and every exon codes for a particular subset of
link |
amino acids and together they're spliced together
link |
and then make the final protein so that genetic variant might be sitting in an
link |
intron of a gene it's transcribed with the gene
link |
it's processed and then excised but it might not impact this gene at all
link |
it might actually impact another gene that's a million nucleotides away so
link |
it's just riding along even though it has nothing to do with the
link |
with this nearby neighborhood that's exactly right
link |
let me give you an example the strongest genetic association with obesity
link |
was discovered in this FTO gene fat and obesity associated gene
link |
so this FTO gene was studied ad nauseam people did tons of experiments on it
link |
they figured out that FTO is in fact a RNA methylation
link |
transferase it basically it sort of impacts something that we know
link |
that we call the epitranscriptome just like the genome can be modified
link |
the transcriptome the transcripts of the genes
link |
can be modified and we basically said oh great that means that
link |
that epitranscriptomics is hugely involved in obesity because that
link |
that gene FTO is you know clearly where the genetic locus is at
link |
my group studied FTO in collaboration with you know a wonderful team led by
link |
Melina Klausnitzer and what we found is that this FTO locus
link |
even though it is associated with obesity
link |
does not implicate the FTO gene
link |
the genetic variant sits in the first intran of the FTO gene
link |
but it controls two genes IRX3 and IRX5 that are sitting
link |
1.2 million nucleotides away several genes away
link |
oh boy uh what am i supposed to feel about that because isn't that like super
link |
complicated then uh so so the way that i was introduced at a
link |
conference a few years ago was uh and here's Manolis Kelis who
link |
wrote the most depressing paper of 2015
link |
and the reason for that is that the entire pharmaceutical industry was so
link |
comfortable that there was a single gene in that locus
link |
because in some loci you basically have three dozen genes that are all sitting
link |
in the same region of association and you're like oh gosh which ones of those
link |
is it but even that question of which ones of
link |
those is it is making the assumption that it is
link |
one of those as opposed to some random gene just far far away
link |
which is what our paper showed so basically what our paper showed is that
link |
you can't ignore the circuitry you have to first figure out the circuitry
link |
all of those long range interactions how every genetic variant impacts the
link |
expression of every gene in every tissue imaginable
link |
across hundreds of individuals and then you now have
link |
one of the building blocks not even all of the building blocks for then going
link |
and understanding disease so okay so embrace the the
link |
wholeness of the circuitry correct but what so back to the question of
link |
starting knowing nothing to the disease and and go into the treatment
link |
so what are the next steps so you basically have to first figure out the
link |
tissue and then describe how you figure out the
link |
tissue you figure out the tissue by taking all of these
link |
noncoding variants that are sitting outside proteins
link |
and then figuring out what are the epigenomic enrichments
link |
and the reason for that you know thankfully is that there is convergence
link |
that the same processes are impacted in different ways by different loci
link |
and that's a saving grace for our field the fact that if I look at hundreds of
link |
genetic variants associated with Alzheimer's they localize
link |
in a small number of processes can you clarify why that's hopeful so like they
link |
show up in the same exact way in the in the specific set of
link |
processes yeah so so basically there's a small
link |
number of biological processes that underlie or at least that play them the
link |
biggest role in every disorder so in Alzheimer's
link |
you basically have you know maybe 10 different types of processes
link |
one of them is lipid metabolism one of them is immune cell function
link |
one of them is neuronal energetics so these are just a small number of
link |
processes but you have multiple lesions
link |
multiple genetic perturbations that are associated with those processes
link |
so if you look at schizophrenia it's excitatory neuron function it's
link |
inhibitory neuron function it's synaptic pruning it's calcium
link |
signaling and so on so forth so when you look at disease genetics
link |
you have one hit here and one hit there and one hit there and one hit there
link |
completely different parts of the genome but it turns out all of those he
link |
hits are calcium signaling proteins oh cool you're like
link |
aha that means that calcium signaling is important
link |
so those people who are focusing on one focus at a time cannot possibly
link |
see that picture you have to become a genomicist you have to
link |
look at the omics the om the holistic picture
link |
to understand these enrichments but you you mentioned the convergence thing so
link |
the whatever the thing associated with the disease shows up
link |
so let me explain convergence yeah convergence is such a beautiful concept
link |
so you basically have these four genes
link |
that are converging on calcium signaling
link |
so that basically means that they are acting each in their own way
link |
but together in the same process but now
link |
in every one of these low side you have many
link |
enhancers controlling each of those genes that's another type of convergence
link |
where dysregulation of seven different enhancers might all converge
link |
on dysregulation of that one gene which then converges
link |
on calcium signaling and in each one of those enhancers
link |
you might have multiple genetic variants distributed across
link |
many different people everyone has their own different mutation
link |
but all of these mutations are impacting that enhancer and all of these
link |
enhancers are impacting that gene and all of these genes are impacting this
link |
pathway and all these pathways are acting the same tissue
link |
and all of these tissues are converging together on the same biological process
link |
of schizophrenia and you're saying the saving grace
link |
is that that convergence seems to happen for a lot of these diseases
link |
for all of them basically that for every single disease that we've looked at
link |
we have found an epigenomic enrichment how do you do that
link |
you basically have all of the genetic variants associated with the disorder
link |
and then you're asking for all of the enhancers active in a particular tissue
link |
for 540 disorders we've basically found that
link |
indeed there is an enrichment that basically means that
link |
there is commonality and from the commonality we can just
link |
get insights so to explain in the mathematical terms
link |
we're basically building an empirical prior
link |
we're using a Bayesian approach to basically say great all of these variants
link |
are equally likely in a particular locus to be important
link |
in a genetic so in a genetic locus you basically have
link |
a dozen variants that are co inherited because the way that inheritance works
link |
in the human genome is through all of these recombination events
link |
during meiosis you basically have you know
link |
you inherit maybe three chromosome three for example in your body
link |
he's inherited from four different parts one part comes from your
link |
dad another part comes from your mom another part comes from your dad and
link |
another part comes from your mom so basically the way that it
link |
sorry from your mom's mom so you basically have one copy that
link |
comes from your dad and one copy that comes from your mom but that copy that
link |
you got from your mom is a mixture of her maternal
link |
and her paternal chromosome and the copy that you got from your dad is a
link |
mixture of his maternal and his paternal chromosome. So these breakpoints that happen
link |
when chromosomes are lining up are basically ensuring through these crossover events,
link |
they're ensuring that every child cell during the process of meiosis where you basically have
link |
one spermatozoid that basically couples with one ovule to basically create one egg to basically
link |
create the zygote. You basically have half of your genome that comes from dad and half your
link |
genome that comes from mom, but in order to line them up, you basically have these crossover
link |
events. These crossover events are basically leading to co inheritance of that entire block
link |
coming from your maternal grandmother and that entire block coming from your maternal grandfather.
link |
Over many generations, these crossover events don't happen randomly. There's a protein called
link |
PRDM9 that basically guides the double stranded breaks and then leads to these crossovers
link |
and that protein has a particular preference to only a small number of hotspots of recombination
link |
which then leads to a small number of breaks between these co inheritance patterns. So even
link |
though there are six million variants, there are six million loci, this variation is inherited in
link |
blocks and every one of these blocks has like two dozen genetic variants that are all associated.
link |
So in the case of FTO, it wasn't just one variant, it was 89 common variants that were all
link |
humongously associated with obesity. Which ones of those is the important one? Well, if you look
link |
at only one locus, you have no idea, but if you look at many loci, you basically say, aha,
link |
all of them are enriching in the same epigenomic map. In that particular case, it was mesenchymal
link |
stem cells. So these are the progenitor cells that give rise to your brown fat and your white fat.
link |
Progenitors like the early on developmental stem cells.
link |
So you start from one zygote and that's a tautipotent cell type. It can do anything.
link |
You then, you know, that cell divides, divides, divides, and then every cell division is leading
link |
to specialization where you now have a mesodermal lineage and ectodermal lineage and endodermal
link |
lineage that basically leads to different parts of your body. The ectoderm will basically give
link |
rise to your skin. Ecto means outside, derm is skin. So ectoderm, but it also gives rise to
link |
your neurons and your whole brain. So that's a lot of ectoderm. Mesoderm gives rise to your
link |
internal organs, including the vasculature and, you know, your muscle and stuff like that. So
link |
you basically have this progressive differentiation. And then if you look further, further down that
link |
lineage, you basically have one lineage that will give rise to both your muscle and your bone,
link |
but also your fat. And if you go further down the lineage of your fat, you basically have your
link |
white fat cells. These are the cells that store energy. So when you eat a lot, but you don't
link |
exercise too much, there's an excess set of calories, excess energy. What do you do with those?
link |
You basically create, you spend a lot of that energy to create these high energy molecules,
link |
lipids, which you can then burn when you need them on a rainy day. So that leads to obesity
link |
if you don't exercise and if you overeat, because your body is like, oh, great, I have all these
link |
calories. I'm going to store them. Ooh, more calories. I'm going to store them too. Ooh,
link |
more calories. And the, you know, 42% of European chromosomes have a predisposition to storing fat,
link |
which was selected probably in the, you know, food scarcity periods. Like basically as we were
link |
exiting Africa, you know, before and during the ice ages, you know, there was probably a selection
link |
to those individuals who made it north to basically be able to store energy, you know, a lot more
link |
energy. So you basically now have this lineage that is deciding whether you want to store energy
link |
in your white fat or burn energy in your beige fat. Turns out that your fat is, you know, like
link |
we, we have such a bad view of fat. Fat is your best friend. Fat can both store all these excess
link |
lipids that would be otherwise circulating through your, you know, body and causing damage,
link |
but it can also burn calories directly. If you have too much of energy, you can just choose to
link |
just burn some of that as heat. So basically when you're cold, you're burning energy to basically
link |
warm your body up and you're burning all these lipids and you're burning all these caters.
link |
Burning all these caters. So what we basically found is that across the board, genetic variants
link |
associated with obesity across many of these regions were all enriched repeatedly in mesenchymal,
link |
stem cell enhancers. So that gave us a hint as to which of these genetic variants was likely
link |
driving this whole association. And we ended up with this one genetic variant called RS142,
link |
1085. And that genetic variant out of the 89 was the one that we predicted to be causal for the
link |
disease. So going back to those steps, first step is figure out the relevant tissue based on the
link |
global enrichment. Second step is figure out the causal variant among many variants in this
link |
linkage disequilibrium in this co inherited block between these recombination hotspots,
link |
these boundaries of these inherited blocks. That's the second step. The third step is once you know
link |
that causal variant, try to figure out what is the motif that is disrupted by that causal variant.
link |
Basically, how does it act? Variants don't just disrupt elements, they disrupt the binding of
link |
specific regulators. So basically the third step there was how do you find the motif that is responsible
link |
like the gene regulatory word, the building block of gene regulation that is responsible for that
link |
dysregulatory event. And the fourth step is finding out what regulator normally binds that motif
link |
and is now no longer able to bind. And then once you have the regulator, can you then try to figure
link |
out how to what after developed how to fix it? That's exactly right. You now know how to intervene.
link |
You have basically a regulator, you have a gene that you can then perturb and you say, well,
link |
maybe that regulator has a global role in obesity. I can perturb the regulator.
link |
Just to clarify, when we say perturbed, on the scale of a human life, can a human being be helped?
link |
Of course. Of course. Yeah. I guess understanding is the first step.
link |
No, no, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics
link |
against that. Or you develop other types of intervention that affect the expression of that
link |
gene. What do pharmaceutical therapeutics look like when your understanding is on a genetic level?
link |
Yeah. Sorry if it's a dumb question. No, no, no. It's a brilliant question,
link |
but I want to save it for a little bit later when we start talking about therapeutics.
link |
Perfect. We've talked about the first four steps. There's two more. So basically the first step is
link |
figure out the zero step. The starting point is the genetics. The first step after that is figure
link |
out the tissue of action. The second step is figuring out the nucleotide that is responsible
link |
or set of nucleotides. The third step is figuring out the motif and the upstream regulator,
link |
number four. Number five and six is what are the targets? So number five is great.
link |
Now I know the regulator, I know the motif, I know the tissue, and I know the variant.
link |
What does it actually do? So you have to now trace it to the biological process
link |
and the genes that mediate that biological process. So knowing all of this can now allow
link |
you to find the target genes. How? By basically doing perturbation experiments or by looking
link |
at the folding of the epigenome or by looking at the genetic impact of that genetic variant
link |
on the expression of genes. And we use all three. So let me go through them. Basically one of them
link |
is physical links. This is the folding of the genome onto itself. How do you even figure out
link |
the folding? It's a little bit of a tangent, but it's a super awesome technology. Think of the
link |
genome as again this massive packaging that we talked about of taking two meters worth of DNA
link |
and putting it in something that's a million times smaller than two meters worth of DNA,
link |
that's a single cell. You basically have this massive packaging and this packaging basically
link |
leads to the chromosome being wrapped around in sort of tight, tight ways. In ways, however,
link |
that are functionally capable of being reopened and reclosed. So I can then go in and figure out
link |
that folding by sort of chopping up the spaghetti soup, putting glue and ligating the segments that
link |
were chopped up but nearby each other, and then sequencing through these ligation events to figure
link |
out that this region of this chromosome, that region of the chromosome were near each other,
link |
that means they were interacting, even though they were far away on the genome itself.
link |
So that chopping up sequencing and regluing is basically giving you folds of the genome that
link |
we call. Sorry, can you backtrack? Of course. How does cutting it help you figure out
link |
which ones were close in the original folding? So you have a bowl of noodles.
link |
Go on. And in that bowl of noodles, some noodles are near each other. Yes.
link |
So throwing a bunch of glue, you basically freeze the noodles in place,
link |
throwing a cutter that chops up the noodles into little pieces.
link |
Now, throwing some ligation enzyme that lets those pieces that were free,
link |
religate near each other. In some cases, they're religate what you had just got,
link |
but that's very rare. Most of the time, they will religate in whatever was proximal. You now have
link |
glued the red noodle that was crossing the blue noodle to each other. You then reverse the glue,
link |
the glue goes away, and you just sequence the heck out of it. Most of the time,
link |
you'll find red segment with, you know, red segment, but you can specifically select for
link |
ligation events that have happened that were not from the same segment by sort of marking
link |
them in a particular way. And then selecting those, and then you sequence and you look for
link |
red with blue matches of sort of things that were glued that were not immediate proximal to
link |
each other. And that reveals the linking of the blue noodle and the red noodle. You're with me
link |
so far? Yeah. Good. So we, you know, we've done these experiments. That's physical. That's physical.
link |
That's step one of the physical. And what the physical revealed is topologically associated
link |
domains, basically big blocks of the genome that are topologically connected together.
link |
That's the physical. The second one is the genetic links. It basically says,
link |
across individuals that have different genetic variants,
link |
how are their genes expressed differently? Remember, before I was saying that the path
link |
between genetics and disease is enormous, but we can break it up to look at the path between
link |
genetics and gene expression. So instead of using Alzheimer's as the phenotype, I can now use
link |
expression of IRX3 as the phenotype, expression of gene A. And I can look at all of the humans
link |
who contain a G at that location and all the humans will contain a T at that location.
link |
And basically say, wow, turns out that the expression of the gene is higher for the T
link |
humans than for the G humans at that location. So that basically gives me a genetic link between
link |
a genetic variant, a locus, a region, and the expression of nearby genes. Good on the genetic
link |
link? I think so. Awesome. So the third link is the activity link. What's an activity link?
link |
It basically says, if I look across 833 different epigenomes, whenever this enhancer is active,
link |
this gene is active. That gives me an activity link between this region of the DNA and that gene.
link |
And then the fourth one is perturbations, where I can go in and blow up that region and see what
link |
are the genes that change in expression. Or I can go in and over activate that region and see what
link |
genes change in expression. So I guess that's similar to activity? Yeah. Yeah. So that's
link |
similar to activity. I agree, but it's causal rather than correlational. Again, I'm a little weird.
link |
No, no, you're 100% on. It's exactly the same as activity perturbation where I go and intervene.
link |
Yes. I basically take a bunch of cells. So you know CRISPR, right? CRISPR is this
link |
genome guidance and cutting mechanism. It's what George George likes to call genome vandalism.
link |
So you basically are able to, you can basically take a guide RNA that you put into the CRISPR
link |
system and the CRISPR system will basically use this guide RNA, scan the genome, find wherever
link |
there's a match and then cut the genome. So, you know, I digress, but it's a bacterial immune
link |
defense system. So basically bacteria are constantly attacked by viruses, but sometimes they win
link |
against the viruses and they chop up these viruses and remember as a trophy inside their genome,
link |
they have this low side, this CRISPR low side that basically stands for clustered repeats,
link |
interspersed, et cetera. So basically it's an interspersed repeats structure where basically
link |
you have a set of repetitive regions and then interspersed where these variable segments
link |
that were basically matching viruses. So when this was first discovered, it was basically
link |
hypothesized that this is probably a bacterial immune system that remembers the trophies of
link |
the viruses that manage the kill. And then the bacteria pass on, you know, they sort of do lateral
link |
transfer of DNA and they pass on these memories so that the next bacterium says, oh, you killed
link |
that guy, when that guy shows up again, I will recognize him. And the CRISPR system was basically
link |
evolved as a bacterial adaptive immune response to sense foreigners that should not belong and to
link |
just go and cut their genome. So it's an RNA guided RNA cutting enzyme or an RNA guided DNA
link |
cutting enzyme. So there's different systems, some of them cut DNA, some of them cut RNA,
link |
but all of them remember this sort of viral attack. So what we have done now as a field is,
link |
you know, through the work of, you know, Jennifer Donner, Emmanuel Carpentier, Fang Zhang and many
link |
others is coopted that system of bacterial immune defense as a way to cut genomes. You basically
link |
have this guiding system that allows you to use an RNA guide to bring enzymes to cut DNA at a
link |
particular locus. That's so fascinating. So this is like already a natural mechanism, a natural tool
link |
for cutting those useful in this particular context. And we're like, well, we can use that
link |
thing to actually, it's a nice tool that's already in the body. Yeah, yeah, it's not in our body,
link |
it's in the bacterial body. It was discovered by the, by the yogurt industry. They were trying to
link |
make better yogurts and they were trying to make their bacteria in their yogurt cultures
link |
more resilient to viruses. And they were studying bacteria and they found that, wow,
link |
this CRISPR system is awesome. It allows you to defend against that. And then it was coopted in
link |
mammalian systems that don't use anything like that as a, as a, as a targeting way to basically
link |
bring these DNA cutting enzymes to any locus in the genome. Why would you want to cut DNA
link |
to do anything? The reason is that our DNA has a DNA repair mechanism where if a region of the
link |
genome gets randomly cut, you will basically scan the genome for anything that matches
link |
and sort of use it by homology. So the reason why we're deployed is because we now have a spare
link |
copy. As soon as my mom's copy is deactivated, I can use my dad's copy. And somewhere else,
link |
if my dad's copy is deactivated, I can use my mom's copy to repair it. So this is called homologous
link |
based repair. So all you have to do is the cutting and you don't have to do the fixing.
link |
That's exactly right. You don't have to do the fixing because it's already built in.
link |
That's exactly right. But the fixing can be coopted by throwing in a bunch of homologous
link |
segments that instead of having your dad's version, have whatever other version you'd like to use.
link |
So the, so you, you then control the fixing by throwing in a bunch of other stuff.
link |
That's exactly right. And that's how you do genome editing. So that's what CRISPR is.
link |
That's what in popular culture, people use the term. I've never, wow, that's brilliant.
link |
So CRISPR is genome vandalism followed by a bunch of band aids that have the sequence that you'd
link |
like. And you can control the, the choices of band aids. Correct. And of course, there's new
link |
generations of CRISPR. There's something that's called prime editing that was sort of very,
link |
very much in the press recently, that basically instead of sort of making a double stranded break,
link |
which again is genome vandalism, you basically make a single stranded break.
link |
You basically just nick one of the two strands, enabling you to sort of peel off without sort
link |
of completely breaking it up and then repair it locally using a guide that is coupled to your
link |
initial RNA that took you to that location. Dumb question, but is CRISPR as awesome and
link |
cool as it sounds? I mean, technically speaking, in terms of like, as a tool for manipulating our
link |
genetics in the positive meaning of the word manipulating, or is there downsides,
link |
drawbacks in this whole context of therapeutics that we're talking about or understanding and
link |
so on. So, when I teach my students about CRISPR, I show them articles with the headline
link |
genome editing tool revolutionizes biology. And then I show them the date of these tools,
link |
of these articles, and they're 2004, like five years before CRISPR was invented.
link |
And the reason is that they're not talking about CRISPR. They're talking about zinc finger enzymes
link |
that are another way to bring these codders to the genome. It's a very difficult way of sort of
link |
designing the right set of zinc finger proteins, the right set of amino acids that will now target
link |
a particular long stretch of DNA. Because, you know, for every location that you want to target,
link |
you need to design a particular regulator, a particular protein that will match that region
link |
well. There's another technology called tailings, which are basically, you know, just a different
link |
way of using proteins to sort of, you know, guide these codders to a particular location of the
link |
genome. These require a massive team of engineers, of biological engineers to basically design a set
link |
of amino acids that will target a particular sequence of your genome. The reason why CRISPR is
link |
amazingly, awesomely revolutionary is because instead of having this team of engineers
link |
design a new set of proteins for every location that you want to target, you just type it in
link |
your computer and you just synthesize an RNA guide. The beauty of CRISPR is not the cutting,
link |
it's not the fixing. All of that was there before. It's the guiding, and the only thing that changes
link |
that it makes the guiding easier by sort of, you know, just typing in the RNA sequence,
link |
which then allows the system to sort of scan the DNA to find that.
link |
So the coding, the engineering of the cutter is easier on the, in terms of, that's kind of similar
link |
to the story of deep learning versus old school machine learning. Some of the challenging parts
link |
are automated. Okay, so, but CRISPR is just one cutting technology. And then there's,
link |
that's part of the challenges and exciting opportunities of the field is to design different
link |
cutting technologies. Yeah. So now, we, you know, this was a big parenthesis on CRISPR.
link |
But now you, you know, when we were talking about perturbations, you basically now have the ability
link |
to not just look at correlation between enhancers and genes, but actually go and either
link |
destroy that enhancer and see if the gene changes in expression, or you can use the CRISPR targeting
link |
system to bring in not vandalism and cutting, but you can couple the CRISPR system with,
link |
and the CRISPR system is called usually CRISPR Cas9, because Cas9 is the protein that will then come
link |
and cut. But there's a version of that protein called dead Cas9, where the cutting part is
link |
deactivated. So you basically use D Cas9, dead Cas9, to bring in an activator or to bring in a
link |
repressor. So you can now ask, is this enhancer changing that gene by taking this modified CRISPR,
link |
which is already modified from the bacteria to be used in humans, that you can now modify the Cas9,
link |
to be dead Cas9, and you can now further modify to bring in a regulator. And you can basically
link |
turn on or turn off that enhancer and then see what is the impact on that gene. So these are the four
link |
ways of linking the locus to the target gene. And that's step number five. Okay, step number five
link |
is find the target gene. And step number six is what the heck does that gene do? You basically now
link |
go and manipulate that gene to basically see what are the processes that change. And you can
link |
basically ask, well, you know, in this particular case, in the FTO locus, we found mesenchymal stem
link |
cells that are the progenitors of white fat and brown fat, or beige fat. We found the RS1421085
link |
nucleotide variant as the causal variant. We found this large enhancer, this master regulator,
link |
I like to call it OB1 for obesity one, like the strongest enhancer associated with and OB1 was
link |
kind of chubby as the actor. I don't know if you remember him. So you basically are using this
link |
Jedi mind trick to basically find out the location of the genome that is responsible, the enhancer
link |
that harbors it, the motif, the upstream regulator, which is arid 5B for AT rich interacting domain
link |
5B, that's a protein that sort of comes and binds normally. That protein is normally a
link |
repressor. It represses this super enhancer, this massive 12,000 nucleotide master regulatory
link |
control region. And it turns off IRX3, which is a gene that's 600,000 nucleotides away,
link |
and IRX5, which is 1.2 million nucleotides away. So those and what's the effect of turning them off?
link |
That's exactly the next question. So step six is what do these genes actually do?
link |
So we then ask, what does RX3 and RX5 do? The first thing we did is look across individuals
link |
for individuals that had higher expression of RX3 or lower expression of RX3. And then
link |
we looked at the expression of all of the other genes in the genome. And we looked for simply
link |
correlation. And we found that IRX3 and RX5 were both correlated positively with lipid metabolism
link |
and negatively with mitochondrial biogenesis. You're like, what the heck does that mean?
link |
It doesn't sound related to obesity. Not at all, superficially. But lipid metabolism should,
link |
because lipids is these high energy molecules that basically store fat. So IRX3 and RX5 are
link |
negatively correlated with lipid metabolism. So that basically means that when they turn on
link |
lipid metabolism, positively, when they turn on, they turn on lipid metabolism. And they're
link |
negatively correlated with mitochondrial biogenesis. What do mitochondria do in this whole
link |
process? Again, small parenthesis. What are mitochondria?
link |
Mitochondria are little organelles. They arose. They only are found in eukaryotes.
link |
U means good. Karyote means nucleus. So truly like a true nucleus. So eukaryotes have a nucleus.
link |
Prokaryotes are before the nucleus. They don't have a nucleus. So eukaryotes have a nucleus.
link |
Compartmentalization. Eukaryotes have also organelles. Some eukaryotes have chloroplasts.
link |
These are the plants. They photosynthesize. Some other eukaryotes, like us, have another
link |
type of organelle called mitochondria. These arose from an ancient species that we engulfed.
link |
This is an endosymbiosis event. Symbiosis, bio means life. Sim means together. So symbiotes
link |
are things that live together. Endosymbiosis, endo means inside. So endosymbiosis means you live
link |
together, holding the other one inside you. So the pre eukaryotes engulfed an organism that
link |
was very good at energy production. And that organism eventually shed most of its genome
link |
to now have only 13 genes in the mitochondrial genome. And those 13 genes are all involved in
link |
energy production, the electron transport chain. So basically, electrons are these massive super
link |
energy rich molecules. We basically have these organelles that produce energy. And when your
link |
muscle exercises, you basically multiply your mitochondria. You basically sort of, you know,
link |
use more and more mitochondria. And that's how you get beefed up. So basically, the muscle sort
link |
of learns how to generate more energy. So basically, every single time your muscles will, you know,
link |
overnight regenerate and sort of become stronger and amplify their mitochondria and so on and so
link |
forth. So what do mitochondria do? The mitochondria use energy to sort of do any kind of task. When
link |
you're thinking, you're using energy. This energy comes from mitochondria. Your neurons have
link |
mitochondria all over the place. Basically, this mitochondria can multiply as organelles
link |
and they can be spread along the body of your muscle. Some of your muscle cells have actually
link |
multiple nuclei, they're polynucleated, but they also have multiple mitochondria to basically
link |
deal with the fact that your muscle is enormous. You can sort of span these super, super long
link |
length. And you need energy throughout the length of your muscle. So that's why you have
link |
mitochondria throughout the length. And you also need transcription through the length. So you
link |
have multiple nuclei as well. So these two processes, lipids, store energy, what do mitochondria do?
link |
So there's a process known as thermogenesis, thermo heat, genesis generation, thermogenesis
link |
generation of heat. Remember that bathtub with the in and out? That's the equation that everybody's
link |
focused on. So how much energy do you consume? How much energy do you burn? But in every thermodynamic
link |
system, there's three parts to the equation. There's energy in, energy out, and energy lost.
link |
Any machine has loss of energy. How do you lose energy? You emanate heat. So heat is energy loss.
link |
Which is where the thermogenesis comes in. Thermogenesis is actually a regulatory process
link |
that modulates the third component of the thermodynamic equation. You can basically
link |
control thermogenesis explicitly. You can turn on and turn off thermogenesis.
link |
And that's when the mitochondria comes into play. Exactly. So IRIX 3 and RX 5 turn out to be the
link |
master regulators of a process of thermogenesis versus lipogenesis generation of fat. So IRIX
link |
3 and RX 5 in most people burn heat, burn calories as heat. So when you eat too much,
link |
just burn it off in your fat cells. So with that bathtub has basically a sort of dissipation
link |
knob that most people are able to turn on. I am unable to turn that on because I am a
link |
homozygous carrier for the mutation that changes a T into a C in the RS 1 4 2 1 0 8 5 allele,
link |
a locus, a SNP. I have the risk allele twice from my mom and from my dad. So I'm unable to
link |
thermogenize. I'm unable to turn on thermogenesis through IRIX 3 and RX 5 because the regulator
link |
that normally binds here, IRIX 5B, can no longer bind because it's an AT rich interacting domain.
link |
And as soon as I change the T into a C, it can no longer bind because it's no longer AT rich.
link |
But doesn't that mean that you're able to use the energy more efficiently?
link |
You're not generating heat or is that... That means I can eat less and get around just fine.
link |
Yes. So that's a feature actually. It's a feature in a food scarce environment.
link |
If we're all starving, I'm doing great. If we all have access to massive amounts of food,
link |
I'm obese basically. That's taken us through the entire process of then
link |
understanding that why mitochondria and then the lipids are both even though distant or somehow
link |
involved. Different sides of the same coin. You basically choose to store energy or you can choose
link |
to burn energy. And that all of that is involved in the puzzle of obesity. And that's what's
link |
fascinating. Here we are in 2007 discovering the strongest genetic association with obesity
link |
and knowing nothing about how it works for almost 10 years. For 10 years, everybody focused on this
link |
FTO gene and they were like, oh, it must have to do something with RNA modification. And it's like,
link |
no, it has nothing to do with the function of FTO. It has everything to do with all of this other
link |
process. And suddenly the moment you solve that puzzle, which is a multi year effort by the way,
link |
a tremendous effort by Melina and many, many others. So this tremendous effort basically
link |
led us to recognize this circuitry. You went from having some 89 common variants associated in
link |
that region of the DNA sitting on top of this gene to knowing the whole circuitry. When you
link |
know the circuitry, you can now go crazy. You can now start intervening at every level. You can
link |
start intervening at the arid 5B level. You can start intervening with CRISPR cast 9 at the single
link |
SNP level. You can start intervening at IRX3 and RX5 directly there. You can start intervening at
link |
the thermogenesis level because you know the pathway. You can start intervening at the
link |
differentiation level where the decision to make either white fat or beige fat, the energy burning
link |
beige fat is made developmentally in the first three days of differentiation of your adipocytes.
link |
So as they're differentiating, you basically can choose to make fat burning machines or fat
link |
storing machines. And sort of that's how you populate your fat. You basically can now go in
link |
pharmaceutical and do all of that. And in our paper, we actually did all of that. We went in and
link |
manipulated every single aspect. At the nucleotide level, we use CRISPR cast 9 genome editing to
link |
basically take primary adipocytes from risk and non risk individuals and show that by editing
link |
that one nucleotide out of 3.2 billion nucleotides in the human genome, you could then flip between
link |
an obese phenotype and a lean phenotype like a switch. You can basically take
link |
micelles that are non thermogenizing and just flip into thermogenizing cells by changing one
link |
nucleotide. It's mind boggling. It's so inspiring that this puzzle could be solved in this way and
link |
it feels within reach to then be able to crack the problem of some of these diseases.
link |
What are the technologies, the tools that came along that made this possible? What are you excited
link |
about maybe if we just look at the buffet of things that you've kind of mentioned? What's
link |
involved? What should we be excited about? What are you excited about? I love that question because
link |
there's so much ahead of us. So basically solving that one locus required massive amounts of knowledge
link |
that we have been building across the years through the epigenome, through the comparative
link |
genomics to find out the causal variant and the controller regulatory motif through the
link |
conserved circuitry. It required knowing these regulatory genomic wiring. It required high C
link |
of the sort of topologically associated domains to basically find this long range interaction.
link |
It required EQTLs of this sort of genetic perturbation of these intermediate gene phenotypes.
link |
It required all of the arsenal of tools that I've been describing was put together for one
link |
locus and this was a massive team effort, huge investment in time, energy, money, effort,
link |
intellectual, you know, everything. You're referring to, I'm sorry, this one paper? Yeah, this one
link |
single paper. This one single locus. I like to say that this is a paper about one nucleotide
link |
in the human genome, about one bit of information, C versus T in the human genome. That's one bit of
link |
information and we have 3.2 billion nucleotides to go through. So how do you do that systematically?
link |
I am so excited about the next phase of research because the technologies that my group and many
link |
other groups have developed allows us to now do this systematically, not just one locus at a time,
link |
but thousands of loci at a time. So let me describe some of these technologies.
link |
The first one is automation and robotics. So basically, you know, we talked about how you
link |
can take all of these molecules and see which of these molecules are targeting each of these
link |
genes and what do they do. So you can basically now screen through millions of molecules,
link |
through thousands and thousands and thousands of plates, each of which has thousands and thousands
link |
and thousands of molecules, every single time testing, you know, all of these genes and asking
link |
which of these molecules perturb these genes. So that's technology number one, automation and
link |
robotics. Technology number two is parallel readouts. So instead of perturbing one locus
link |
and then asking if I use CRISPR Cas9 on this enhancer to basically use D Cas9 to turn on or
link |
turn off the enhancer, or if I use CRISPR Cas9 on the SNP to basically change that one SNP at a time,
link |
then what happened? But we have 120,000 disease associated SNPs that we want to test.
link |
We don't want to spend 120,000 years doing it. So what do we do? We've basically developed this
link |
technology for massively parallel reporter assays, MPRA. So in collaboration with Tarzan
link |
Mickelson, Eric Lander, I mean, Jason Durie's group has done a lot of that. So there's a lot of
link |
groups that basically have developed technologies for testing 10,000 genetic variants at a time.
link |
Okay. How do you do that? You know, we talked about microarray technology, the ability to
link |
synthesize these huge microarrays that allow you to do all kinds of things like measure gene
link |
expression by hybridization, by measuring the genotype of a person, by looking at hybridization
link |
with one version with a T versus the other version with a C, and then sort of figuring out that I
link |
am a risk carrier for obesity based on these hybridization, differential hybridization in my
link |
genome that says, oh, you seem to only have this allele, or you seem to have that allele.
link |
Microarrays can also be used to systematically synthesize small fragments of DNA. So you can
link |
basically synthesize these 150 nucleotide long fragments across 450,000 spots at a time.
link |
You can now take the result of that synthesis, which basically works through all of these sort
link |
of layers of adding one nucleotide at a time. You can basically just type it into your computer
link |
and order it. And you can basically order 10,000 or 100,000 of these small DNA segments at a time.
link |
And that's where awesome molecular biology comes in. You can basically take all these segments,
link |
have a common start and end barcode or sort of ligator, like just like pieces of a puzzle.
link |
You can make the same end piece and the same start piece for all of them. And you can now
link |
use plasmids, which are these extra chromosomal small DNA circular segments
link |
that are basically inhabiting all our all our genomes. We basically have, you know,
link |
plasmids floating around. Bacteria use plasmids for transferring DNA. And that's where they
link |
put a lot of antibiotic resistance genes. So they can easily transfer them from one bacterium to the
link |
other. So one bacterium evolves a gene to be resistant to a particular antibiotic. It basically
link |
says to all its friends, Hey, here's that sort of DNA piece, we can now coopt these plasmids
link |
into human cells. We can basically make a human cell culture and add plasmids to that human cell
link |
culture that contain the things that you want to test. You now have this library of 450,000
link |
elements. You can insert them each into the common plasmid and then test them in millions of cells
link |
in parallel. And the common plasmid is all the same before you add it. Exactly. The rest of the
link |
plasmid is the same. So it's called an epizomal reporter assay. Epizome means not inside the
link |
genome, it's sort of outside the chromosomes. So it's an epizomal assay that allows you to have
link |
a variable region where you basically test 10,000 different enhancers. And you have a common region
link |
which basically has the same reporter gene. You now can do some very cool molecular biology.
link |
You can basically take the 450,000 elements that you've generated. And you have a piece of the
link |
puzzle here, piece of the puzzle here, which is identical. So they're compatible with that plasmid.
link |
You can chop them up in the middle to separate a barcode reporter from the enhancer and in the
link |
middle put the same gene again using the same piece of the puzzle. You now can have a barcode
link |
readout of what is the impact of 10,000 different versions of an enhancer on gene expression.
link |
So we're not doing one experiment, we're doing 10,000 experiments. And those 10,000 can be
link |
5,000 of different loci and each of them in two versions, risk or non risk. I can now test 10s
link |
of thousands. These are little hypotheses. Exactly. And then you can do 10,000 and wait.
link |
You can test 10,000 hypotheses at once. How hard is it to generate those 10,000?
link |
Trivial, trivial. But it's biology. No, no, generating the 10,000 is trivial because you
link |
basically add, it's biotechnology. You basically have these arrays that add one nucleotide at a
link |
time at every spot. And yet, so it's printing in it. So you're able to control. Super costly.
link |
10,000 bucks. So this isn't millions? 10,000 bucks for 10,000 experiments. Sounds like the right,
link |
you know. So that's super, that's exciting because you don't have to do one thing at a time.
link |
You can now use that technology, these massively parallel reporter assays to test 10,000 locations
link |
at a time. We've made multiple modifications to that technology. One was sharper MPRA, which stands
link |
for, you know, basically getting a higher resolution view by tiling these elements. So you can see
link |
where along the region of control are they acting. And we made another modification called Hydra
link |
for high, you know, definition regulatory annotation or something like that, which basically
link |
allows you to test 7 million of these at a time by sort of cutting them directly from the DNA.
link |
So instead of synthesizing, which basically has the limit of 450,000 that you can synthesize at a
link |
time, basically said, hey, if we want to test all accessible regions of the genome, let's just do an
link |
experiment that cuts accessible regions. Let's take those accessible regions, put them all with the
link |
same end joints of the puzzles, and then now use those to create a much, much larger array of things
link |
that you can test. And then tiling all of these regions, you can then pinpoint what are the driver
link |
nucleotides, what are the elements, how are they acting across 7 million experiments at a time.
link |
So basically, this is all the same family of technology, where you're basically using these
link |
parallel readouts of the barcodes. And then, you know, to do this, we used a technology called
link |
Starseq for self transcribing reporter assets, a technology developed by Alex Stark, my
link |
former postdoc, who's now a PI over in Vienna. So we basically coupled the Starseq, the self
link |
transcribing reporters, where the enhancer can be part of the gene itself. So instead of having a
link |
separate barcode, that enhancer basically acts to turn on the gene, and he's transcribed as part of
link |
the gene. You don't have to have the two separate parts. Exactly. So you can just read them directly.
link |
So there's a constant improvements in this whole process. By the way, generating all these options
link |
is a basically brute force. How much human intuition is? Oh, gosh, of course, it's human
link |
intuition and human creativity and incorporating all of the input data sets. Because again,
link |
the genome is enormous, 3.2 billion, you don't want to test that. Instead, you basically use all
link |
of these tools that I've talked about already, you generate your top favorite 10,000 hypothesis,
link |
and then you go and test all 10,000. And then from what comes out, you can then go to the next step.
link |
So that's technology number two. So technology number one is robotics, automation, where you
link |
have thousands of wells, and you constantly test them. The second technology is instead
link |
of having wells, you have these massively parallel readouts in sort of these pooled asses.
link |
The third technology is coupling CRISPR perturbations with these single cell RNA readouts.
link |
So let me make another parenthesis here to describe now single cell RNA sequencing.
link |
So what does single cell RNA sequencing mean? So RNA sequencing is what has been traditionally used,
link |
well, traditionally the last 20 years, ever since the advent of next generation sequencing.
link |
So basically before RNA expression profiling was based on this microarrays. The next technology
link |
after that was based on sequencing. So you chop up your RNA and you just sequence small molecules,
link |
just like you would sequence a genome, basically reverse transcribe the small RNAs into DNA,
link |
and you sequence that DNA in order to get the number of sequencing reads corresponding to
link |
the expression level of every gene in the genome. You now have RNA sequencing. How do you go to
link |
single cell RNA sequencing? That technology also went through stages of evolution. The first was
link |
microfluidics. You basically had these, or even chambers, you basically had these ways of isolating
link |
individual cells, putting them into a well for every one of these cells. So you have 384 well
link |
plates, and you now do 384 parallel reactions to measure the expression of 384 cells. That sounds
link |
amazing, and it was amazing. But we want to do a million cells. How do you go from these wells
link |
to a million cells? You can't. So what the next technology was after that is instead of using
link |
a well for every reaction, you now use a lipid droplet for every reaction. So you use micro
link |
droplets as reaction chambers to basically amplify RNA. So here's the idea. You basically have
link |
microfluidics where you basically have every single cell coming down one tube in your microfluidics,
link |
and you have little bubbles getting created in the other way with
link |
specific primers that mark every cell with its own barcode. You basically couple the two,
link |
and you end up with little bubbles that have a cell and tons of markers for that cell.
link |
You now mark up all of the RNA for that one cell with the same exact barcode,
link |
and you then lyse all of the droplets, and you sequence the heck out of that,
link |
and you have for every RNA molecule a unique identifier that tells you what cell was it on.
link |
That is such good engineering, microfluidics, and using some kind of primer to put a label
link |
on the thing. I mean, you're making it sound easy. I assume it's beautiful, but it's gorgeous.
link |
So there's the next generation. So that's the second generation. Next generation is forget
link |
the microfluidics all together. Just use big bottles. How can you possibly do that with big
link |
bottles? So here's the idea. You dissociate all of your cells or all of your nuclei from
link |
complex cells like brain cells that are very long and sticky, so you can't do that. So if you have
link |
blood cells or if you have neuronal nuclei or brain nuclei, you can basically dissociate,
link |
let's say, a million cells. You now want to add a unique barcode in each one of a million cells
link |
using only big bottles. How can you possibly do that? Sounds crazy, but here's the idea.
link |
You use a hundred of these bottles. You randomly shuffle all your million cells,
link |
and you throw them into those hundred bottles randomly, completely random. You add one barcode
link |
out of a hundred to every one of the cells. You then you now take them all out, you shuffle them
link |
again, and you throw them again into the same hundred bottles, but now in a different randomization,
link |
and you add a second barcode. So every cell now has two barcodes. You take them out again,
link |
you shuffle them, and you throw them back in. Another third barcode is adding randomly from
link |
the same hundred barcodes. You've now labeled every cell probabilistically based on the unique
link |
path that it took of which of a hundred bottles did it go for the first time, which of a hundred
link |
bottles the second time, and which of a hundred bottles the third time. A hundred times a hundred
link |
times a hundred is a million unique barcodes in every single one of these cells without ever using
link |
microfluid. It's beautiful, right? From a computer science perspective, it's very clever. So you now
link |
have the single cell sequencing technology. You can use the wells, you can use the bubbles,
link |
or you can use the bottles. The bottles still sound pretty damn good. The bottles are awesome,
link |
and that's basically the main technology that we're using. So the bottles is the main technology.
link |
So there are kids now that companies just sell to basically carry out single cell
link |
RNA sequencing that you can basically, for $2,000, you can basically get 10,000 cells from one sample.
link |
And for every one of those cells, you basically have the transcription of thousands of genes.
link |
And of course, the data for any one cell is noisy, but being computer scientists, we can aggregate
link |
the data from all of the cells together across thousands of individuals together to basically
link |
make very robust inferences. So the third technology is basically single cell RNA sequencing that allows
link |
you to now start asking not just what is the brain expression level difference of that genetic variant,
link |
but what is the expression difference of that one genetic variant across every single subtype
link |
of brain cell? How is the variance changing? You can't just, you know, with a brain sample,
link |
you can just ask about the mean, what is the average expression? If I instead have 3,000 cells
link |
that are neurons, I can ask not just what is the neuronal expression, I can say for
link |
layer five excitatory neurons, of which I have, I don't know, 300 cells, what is the variance
link |
that this genetic variant has? So suddenly, it's amazingly more powerful. I can basically start
link |
asking about this middle layer of gene expression at unprecedented levels. And when you look at the
link |
average, it washes out some potentially important signal that corresponds to ultimately the disease.
link |
Completely. Yeah. So that I can do that at the RNA level, but I can also do that at the DNA level
link |
for the epigenome. So remember how before I was telling you about all these technologies that
link |
we're using to probe the epigenome, one of them is DNA accessibility. So what we're doing in my lab
link |
is that from the same dissociation of say a brain sample, where you now have all these tens of thousands
link |
of cells floating around, you basically take half of them to do RNA profiling. And the other have to
link |
do epigenome profiling, both at the single cell level. So that allows you to now figure out what
link |
are the millions of DNA enhancers that are accessible in every one of tens of thousands of cells.
link |
And computationally, we can now take the RNA and the DNA readouts and group them together
link |
to basically figure out how is every enhancer related to every gene. And remember these sort
link |
of enhancer gene linking that we were doing across 833 samples. 833 is awesome. Don't get me wrong.
link |
But 10 million is way more awesome. So we can now look at correlated activity across 2.3 million
link |
enhancers and 20,000 genes in each of millions of cells to basically start piecing together the
link |
regulatory circuitry of every single type of neuron, every single type of astrocytes, oligodendrocytes,
link |
microglial cell, inside the brains of 1,500 individuals that we've sampled across multiple
link |
different brain regions across both DNA and RNA. So that's the data set that my team generated last
link |
year alone. So in one year, we've basically generated 10 million cells from human brain across
link |
a dozen different disorders across ketophenia, Alzheimer's, frontal temporal dementia,
link |
Lewy body dementia, ALS, you know, Huntington's disease, post traumatic stress disorder, autism,
link |
like, you know, bipolar disorder, healthy aging, etc. So it's possible that even just within that
link |
data set lie a lot of keys to understanding these diseases and then be able to, like, directly leads
link |
to then treatment. Correct. Correct. So basically, we are now motivating. Yeah. So our computational
link |
team is in heaven right now. And we're looking for people. I mean, if you have listeners who are
link |
super smart. So this is a very interesting kind of side question. How much of this is biology?
link |
How much of this is computation? So you had the computational biology group, but how much of
link |
I should, should you be comfortable with biology to be able to solve some of these problems?
link |
If you just find if you put several of the hassle you were on,
link |
fundamentally, are you thinking like a computer scientist here?
link |
You have to. This is the only way. As I said, we are the descendants of the first digital
link |
computer. We're trying to understand the digital computer, understand, we're trying to understand
link |
the circuitry, the logic of this digital, you know, core computer and all of these analog layers
link |
surrounding it. So you, you know, the case that I've been making is that you cannot think one gene
link |
at a time. The traditional biology is dead. There's no way you cannot solve disease with
link |
traditional biology. You need it as a component. Once you figured out RX3 and RX5, you now can
link |
then say, Hey, have you guys worked on those genes with your single gene approach? We'd love to know
link |
everything you know. And if you haven't, we now know how important these genes are. Let's now launch
link |
a single gene program to dissect them and understand them. But you cannot use that as a way to dissect
link |
disease. You have to think genomically. You have to think from the global perspective,
link |
and you have to build these circuits systematically. So we need numbers of computer
link |
scientists who are interested and willing to dive into this data, you know, fully, fully in and sort
link |
of extract meaning. We need computer science people who can understand sort of machine learning and
link |
inference and sort of, you know, decouple these matrices, come up with super smart ways of sort
link |
of dissecting them. But we also need by all computer scientists who understand biology,
link |
who are able to design the next generation of experiments. Because many of these experiments,
link |
no one in the right mind would design them without thinking of the analytical approach
link |
that you would use to deconvolve the data afterwards. Because it's massive amounts of
link |
ridiculously noisy data. And if you don't have the computational pipeline in your head before you
link |
even design the experiment, you would never design the experiment that way. That's brilliant. So you
link |
in designing the experiment, you have to see the entirety of the computational pipeline.
link |
That drives the design. That even drives the necessity for that design. Basically, you know,
link |
if you didn't have a computer scientist way of thinking, you would never design these hugely
link |
combinatorial, massively parallel experiments. So that's why you need interdisciplinary teams.
link |
You need teams. And I want to sort of clarify that what do we mean by computational biology group?
link |
The focus is not on computational, the focus is on biology. So we are a biology group. What type
link |
of biology? Computational biology. That's the type of biology that uses the whole genome. That's the
link |
type of biology that designs experiments, genomic experiments that can only be interpreted in the
link |
context of the whole genome. Right. So it's philosophically looking at biology as a computer.
link |
Correct. Correct. So which is in the context of the history of biology is a big transformation.
link |
Yeah. Yeah. You can think of the name as what do we do? Only computation. That's not true. But
link |
how do we study it? Only computationally, that is true. So all of these single cell sequencing
link |
can now be coupled with the technology that we talked about earlier for perturbation.
link |
So here's a crazy thing. Instead of using these wells and these robotic systems for doing
link |
one drug at a time or for perturbing one gene at a time in thousands of wells, you can now do this
link |
using a pool of cells and single cell RNA sequencing. How? You basically can take these
link |
perturbations using CRISPR and instead of using a single guide RNA, you can use a library of
link |
guide RNAs generated exactly the same way using this array technology. So you synthesize a thousand
link |
different guide RNAs. You now take each of these guide RNAs and you insert them in a pool of cells
link |
where every cell gets one perturbation and you use CRISPR editing or CRISPR,
link |
so with either CRISPR Cas9 to edit the genome with these thousand perturbations or with the
link |
activation or with the repression and you now can have a single cell readout where every single cell
link |
has received one of these modifications and you can now in massively parallel ways couple the
link |
perturbation and the readout in a single experiment. How are you tracking which perturbations
link |
with each cell received? So there's ways of doing that but basically one way is to make
link |
that perturbation an expressible vector so that part of your RNA reading is actually
link |
that perturbation itself. So you can basically put it in an expressible part so you can self
link |
drive it. So the point that I want to get across is that the sky is the limit. You basically have
link |
these tools, these building blocks of molecular biology, you have these massive data sets of
link |
computational biology, you have this huge ability to sort of use machine learning and statistical
link |
methods and you know linear algebra to sort of reduce the dimensionality of all these massive
link |
data sets and then you end up with a series of actionable targets that you can then couple
link |
with pharma and just go after systematically. So the ability to sort of bring genetics to the
link |
epigenomics to the transcriptomics to the cellular readouts using these sort of high
link |
throughput perturbation technologies that I'm talking about and ultimately to the organism
link |
through the electronic health record endofenotypes and ultimately the disease battery of assays
link |
at the cognitive level at the physiological level and you know every other level.
link |
There is no better or more exciting field in my view to be a computer scientist then
link |
or to be a scientist in period. Basically this confluence of technologies of computation,
link |
of data, of insight and of tools for manipulation is unprecedented in human history and I think
link |
this is what's shaping the next century to really be a transformative century for our species
link |
and for our planet. So you think the 21st century will be remembered for the big
link |
leaps in understanding and alleviation of biology? If you look at the path between discovery and
link |
therapeutics it's been on the order of 50 years, it's been shortened to 40, 30, 20 and now it's on
link |
the order of 10 years but the huge number of technologies that are going on right now for
link |
discovery will result undoubtedly in the most dramatic manipulation of human biology that
link |
we've ever seen in the history of humanity in the next few years. Do you think we might be able to
link |
cure some of the diseases we started this conversation with? Absolutely. It's only a matter
link |
of time. Basically the complexity is enormous and I don't want to underestimate the complexity
link |
but the number of insights is unprecedented and the ability to manipulate is unprecedented
link |
and the ability to deliver these small molecules and other non traditional medicine perturbations.
link |
There's a lot of new generation of perturbations that you can use at the DNA level, at the RNA
link |
level, at the microRNA level, at the epigenomic level. There's a battery of new generations of
link |
perturbations. If you couple that with cell type identifiers that can basically sense when you are
link |
in the right cell based on the specific combination and then turn on that intervention for that cell
link |
you can now think of combinatorial interventions where you can basically sort of feed a synthetic
link |
biology construct to someone that will basically do different things in different cells. So
link |
basically for cancer this is one of the therapeutics that our collaborator Ron Weiss is using to
link |
basically start sort of engineering the circuits that will use microRNA sensors of the environment
link |
to sort of know if you're in a tumor cell or if you're in an immune cell or if you're in a
link |
stromal cell and so forth and basically turn on particular interventions there. You can sort of create constructs
link |
that are tuned to only the liver cells or only the heart cells or only the you know brain cells
link |
and then have these new generations of therapeutics coupled with this immense amount of knowledge
link |
on the sort of which targets to choose and what biological processes to measure
link |
and how to intervene. My view is that disease is going to be fundamentally altered and alleviated
link |
as we go forward. Next time we talk we'll talk about the philosophical implications
link |
that and the effect of life but let's stick to biology for just a little longer. We did pretty
link |
good today we're still stuck to the science. What are you excited in terms of the future of this
link |
of this field the technologies in your own group in your mind you're leading the world at MIT in
link |
the science and the engineering of this work so what are you excited about here? I could not be
link |
more excited. We are one of many many teams who are working on this. In my team the most exciting
link |
parts are you know many folds so basically we've now assembled these batteries of technologies
link |
we've assembled these massive data sets and now we're really sort of in the stage of our team's
link |
path of generating disease insights so we are simultaneously working on a paper on schizophrenia
link |
right now that is basically using the single cell profiling technologies using this editing and
link |
manipulation technologies to basically show how the master regulators underlying changes in the
link |
brain that are sort of found in in schizophrenia are in fact affecting excitatory neurons and
link |
inhibitory neurons in pathways that are active both in synaptic pruning but also in early development
link |
we've basically found this set of four regulators that are connecting these two processes that were
link |
previously separate in schizophrenia in sort of having a sort of more unified view across those
link |
two those two sides the second one is in the in the area of metabolism we basically now have a beautiful
link |
collaboration with a Goodyear lab that's basically looking at multi tissue perturbations in six or
link |
seven different tissues across the body in the context of exercise and in the context of nutritional
link |
interventions using both mouse and human where we can basically see what are the cell to cell
link |
communications that are that are changing across them and what we're finding is this immense role
link |
of both immune cells as well as adipocyte stem cells in sort of reshaping that circuitry of all
link |
of these different tissues and that's sort of painting to a new path for therapeutic intervention
link |
there in Alzheimer's it's this huge focus on microglia and now we're discovering different
link |
classes of microglial cells that are basically either synaptic or immune and these are playing
link |
vastly different roles in Alzheimer's versus in schizophrenia and what we're finding is this
link |
immense complexity as you go further and further down of how in fact there's 10 different types
link |
of microglia each with their own sort of expression programs we used to think of them as oh yeah they're
link |
microglia but in fact now we're realizing just even in that sort of least abundant of cell types
link |
there's this incredible diversity there the differences between brain regions is is another
link |
sort of major major insight again you know one would think that oh astrocytes are astrocytes no
link |
matter where they are but no there's incredible region specific differences in the expression
link |
patterns of all of the major brain cell types across different brain regions so basically
link |
there's the neocortical regions that are sort of the recent innovation that makes us so different
link |
from all other species there's the sort of you know reptilian brain sort of regions that are sort of
link |
much more uh you know very extremely distinct there's the cerebellum there's um each of those
link |
basically is associated in a different way with disease and what we're doing now is looking into
link |
pseudo temporal models for how disease progresses across different regions of the brain if you look
link |
at Alzheimer's it basically starts in this small region called the entorhinal cortex and then it
link |
spreads through the brain and you know through the hippocampus and you know the ultimately
link |
affecting the neocortex and with every brain region that it hits it basically has a different
link |
impact on the cognitive and you know memory aspects orientation short term memory long
link |
term memory etc which is you know dramatically affecting the cognitive path that the individuals
link |
go through so what we're doing now is uh creating these computational models for ordering the cells
link |
and the regions and the individuals according to their ability to predict Alzheimer's disease
link |
so we can have a cell level predictor of pathology that allows us to now create a temporal time
link |
course that tells us when every gene turns on along this pathology progression and then trace
link |
that across regions and pathological measures that are region specific but also cognitive
link |
measures and so on and so forth so that allows us to now sort of for the first time look at can we
link |
actually do early intervention for Alzheimer's where we know that the disease starts manifesting
link |
for 10 years before you actually have your first cognitive loss can we start seeing that path to
link |
build new diagnostics new prognostics new biomarkers for this sort of early intervention
link |
in Alzheimer's the other aspect that we're looking at is mosaicism we talked about the
link |
common variants and the rare variants but in addition to those rare variants as your initial cell
link |
that that forms the zygote divides and divides and divides with every cell division there are
link |
additional mutations that are happening so what you end up with is your brain being a mosaic
link |
of multiple different types of genetic underpinnings some cells contain a mutation that other cells
link |
don't have so every human has the common variants that all of us carry to some degree the rare
link |
variants that your immediate tree of the human species carries and then there's the somatic
link |
variants which is the tree that happened after the zygote that sort of forms your own body so
link |
these somatic alterations is something that has been previously inaccessible to study in human
link |
postmortem samples but right now with the advent of single cell RNA sequencing and this particular
link |
case we're using the well based sequencing which is much more expensive but gives you a lot richer
link |
information about each of those transcripts so we're using now that richer information to infer
link |
mutations that have happened in each of the thousands of genes that sort of are active in
link |
these cells and then understand how the genome relates to the function this genotype phenotype
link |
relationship that we usually build in GWAS between in genome wide association studies between genetic
link |
variation and disease we're now building that at the cell level where for every cell we can relate
link |
the unique specific genome of that cell with the expression patterns of that cell and the predicted
link |
function using these predictive models that I mentioned before on dysregulation for cognition
link |
for pathology in Alzheimer's at the cell level and what we're finding is that the genes that are
link |
altered and the genetic regions that are altered in common variants versus rare variants versus
link |
somatic variants are actually very different from each other the somatic variants are pointing to
link |
neuronal energetics and oligodendrocyte functions that are not visible in the genetic
link |
legions that you find for the common variants probably because they have too strong of an effect
link |
that evolution is just not tolerating them on the common side of the allele frequency spectrum
link |
so the somatic one that's the variation that happens after the after you individual I mean
link |
this is a dumb question but there's there's mutation and variation I guess that happens there
link |
and you're saying that they're through this if we focus in on individual cells we're able to detect
link |
the story that's interesting there and that might be a very unique kind of important variability
link |
that arises for you said neuronal or something that was energetics energetics so your your I mean
link |
the metabolism of humans is dramatically altered from that of nearby species you know we talked
link |
about that last time that basically we are able to consume meat that is incredibly energy rich
link |
and that allows us to sort of have functions that are you know meeting this humongous brain
link |
that we have basically on one hand every one of our brain cells is much more energy efficient than
link |
our neighbors than our relatives number two we have way more of these cells and number three we
link |
have you know this new diet that allows us to now feed all these needs that basically creates
link |
a massive amount of damage oxidative damage from this huge super powered factory of ideas and thoughts
link |
that we that we carry in our skull and that factory has energetic needs and there's a lot of sort of
link |
biological processes underlying that that we are finding are altered in the context of Alzheimer's
link |
disease that's fascinating that so you have to consider all of these systems if you want to
link |
if you want to understand even something like diseases that you would maybe traditionally
link |
associate with just the particular cells of the brain yeah the immune system the metabolic system
link |
metabolic system and these are all the things that makes us uniquely human so our immune system
link |
is dramatically different from that of our neighbors our societies are so much more clustered the
link |
history of infection that have played the human population is you know dramatically different
link |
from every other species the you know the way that our society in our population has sort of
link |
exploded has basically put unique pressures on our immune system and our immune system has both
link |
coped with that density and also been shaped by as I mentioned the you know vast amount of death
link |
that has happened in the in the black plague and other sort of selective events in human history
link |
famines ice ages and so forth so that's number one them on the sort of immune side on the metabolic
link |
side you know again we are able to sort of run marathons you know I don't know if you remember
link |
the sort of human versus horse experiment where the horse actually tires out faster than the human
link |
and the human actually wins so on the metabolic side we're dramatically different on the immune
link |
side we're dramatically different on the brain side again you know no need to sort of you know
link |
it's a no brainer of how our brain is like enormously more capable and then in you know
link |
in the side of cancer so basically the cancers that humans are having the exposures the environmental
link |
exposures is again dramatically different and the lifespan the expansion of human lifespan
link |
is unseen in any other species in you know recent evolutionary history and that now leads to a lot
link |
of new disorders that are starting to you know manifest late in life so you know Alzheimer's
link |
is one example where basically you know these vast energetic needs over a lifetime of thinking
link |
can basically lead to all of these debris and eventually saturate the system and lead to you
link |
know Alzheimer's in in the late life but there's you know there's just such a such a dramatic
link |
set of frontiers when it comes to aging research that you know we'll so what I often like to say
link |
is that if you want to re to to engineer a car to go from 70 miles an hour to 120 miles an hour
link |
that's fine you can basically you know fix a few components if you wanted to now go at 400 miles an
link |
hour you have to completely redesign the entire car because the system is just not evolved to go
link |
that far basically our human body has only evolved to live to I don't know 120 maybe we can get to
link |
150 with minor changes but if you know as we start pushing these frontiers for not just living but
link |
well living the f zine that we talked about last time so to to basically push f zine into the 80s
link |
and 90s and 100s and you know much further than that we will face new challenges that have you
link |
know never been faced before in terms of cancer the number of divisions in terms of Alzheimer's
link |
and brain related disorders in terms of metabolic disorders in terms of regeneration
link |
there's just so many different frontiers ahead of us so I am thrilled about where we're heading
link |
so basically I see this confluence in my lab and many other labs of AI of you know sort of you
link |
know the next frontier of AI for drug design so basically these sort of graph neural networks
link |
on specific chemical designs that allow you to create new generations of therapeutics these
link |
molecular biology tricks for intervening at the system at every level these personalized medicine
link |
prediction diagnosis and prognosis using the electronic health records and using these
link |
polygenic race scores weighted by the burden the number of mutations that are accumulating
link |
across common rare and somatic variants the burden converging across all of these different
link |
molecular pathways the delivery of specific drugs and specific interventions into specific
link |
cell types and again you've talked with Bob Langer about this there's you know many giants in that
link |
field and then the last concept is not intervening at the single gene level I want you to sort of
link |
conceptualize the concept of an on target side effect what is an on target side effect an
link |
off target side effect is when you design a molecule to target one gene and instead it targets
link |
another gene and you have side effects because of that and on target side effect is when your
link |
molecule does exactly what you were expecting but that gene is pliotropic plio means many
link |
tropos means ways many ways it acts in many ways it it's a multifunctional gene so you find that
link |
this gene plays a role in this but as we talked about the wiring of genes to phenotypes is extremely
link |
dense and extremely complex so the next stage of intervention will be intervening not at the
link |
gene level but at the network level intervening at the set of pathways and the set of genes with
link |
multi input perturbations to the system multi input modulations pharmaceutical or other
link |
interventional and that basically allow you to now work at the sort of full level of understanding
link |
not just in your brain but across your body not just in one gene but across the set of pathways
link |
and so on so forth for every one of these disorders so I think that we're finally at the level of
link |
systems medicine of basically instead of sort of medicine being at the single gene level
link |
medicine being at the systems level where you can be personalized based on the specific
link |
set of genetic markers and genetic perturbations that you are either born with or that you have
link |
developed during your lifetime your unique set of exposures your unique set of biomarkers
link |
and you know your unique set of you know current set of conditions through your
link |
EHR and other ways and the precision component of intervening extremely precisely
link |
in the specific pathways and in specific combinations of genes that should be modulated
link |
to sort of bring you from the disease state to the physiologically normal state or even
link |
to a physiologically improved state through this combination of intervention so that that's in my
link |
view the field where basically computer science comes together with you know artificial intelligence
link |
statistics all of these other tools molecular biology technologies and biotechnology and
link |
pharmaceutical technologies that are sort of revolution and the way of intervention and
link |
of course this massive amount of molecular biology and data gathering and generation and
link |
perturbation in massively parallel ways so there's no better way there's no better you know time
link |
there's no better place to be sort of you know looking at this whole confluence of of ideas
link |
and I'm just so thrilled to be a small part of this amazing enormous ecosystem.
link |
It's exciting to imagine what the humans of a hundred to twenty years from now what their life
link |
experience is like because these ideas seem to have potential to transform the quality of life
link |
that when they look back at us they probably wonder how we were put up with all the suffering
link |
in the world. Minolas it's a huge honor thank you for spending this early Sunday morning with me
link |
I deeply appreciate it see you next time. Sounds like a plan thank you Lex. Thanks for listening
link |
to this conversation with Minolas Kellis and thank you to our sponsors SEMrush which is
link |
an SEO optimization tool, pessimist archive which is one of my favorite history podcasts,
link |
Aidsleep which is a self cooling mattress with smart sensors and an app and finally BetterHelp
link |
which is an online therapy service. Please check out these sponsors in the description to get a
link |
discount and to support this podcast. If you enjoy this thing subscribe on YouTube review it with
link |
five stars and app a podcast follow on Spotify support on Patreon or connect with me on Twitter
link |
at Lex Friedman and now let me leave you some words from Haruki Barakami. Human beings are
link |
ultimately nothing but carriers, passageways for genes. They ride us into the ground like
link |
race horses from generation to generation. Genes don't think about what constitutes good or evil.
link |
They don't care whether we're happy or unhappy. We're just means to an end for them.
link |
The only thing they think about is what is most efficient for them.
link |
Thank you for listening and hope to see you next time.