back to index

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115


small model | large model

link |
00:00:00.000
The following is a conversation with Dilip George, a researcher at the intersection of
link |
00:00:05.360
Neuroscience and Artificial Intelligence, cofounder of Vicarious with Scott Phoenix,
link |
00:00:10.880
and formerly cofounder of Numenta with Jeff Hawkins, who's been on this podcast, and
link |
00:00:16.800
Donna Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks
link |
00:00:23.520
to today, Dilip's always sought to engineer intelligence that is closely inspired by the
link |
00:00:29.600
human brain. As a side note, I think we understand very little about the fundamental principles
link |
00:00:35.760
underlying the function of the human brain, but the little we do know gives hints that may be
link |
00:00:41.600
more useful for engineering intelligence than any idea in mathematics, computer science, physics,
link |
00:00:46.960
and scientific fields outside of biology. And so the brain is a kind of existence proof that says
link |
00:00:53.120
it's possible. Keep at it. I should also say that brain inspired AI is often overhyped and use this
link |
00:01:01.040
fodder just as quantum computing for marketing speak, but I'm not afraid of exploring these
link |
00:01:08.000
sometimes overhyped areas since where there's smoke, there's sometimes fire.
link |
00:01:13.680
Quick summary of the ads. Three sponsors, Babbel, Raycon Earbuds, and Masterclass. Please consider
link |
00:01:20.400
supporting this podcast by clicking the special links in the description to get the discount.
link |
00:01:25.760
It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube,
link |
00:01:31.440
review it with five stars on Apple Podcast, support on Patreon, or connect with me on Twitter
link |
00:01:36.400
at Lex Friedman. As usual, I'll do a few minutes of ads now and never any ads in the middle that
link |
00:01:42.400
can break the flow of the conversation. This show is sponsored by Babbel, an app and website that
link |
00:01:48.960
gets you speaking in a new language within weeks. Go to babbel.com and use code LEX to get three
link |
00:01:54.480
months free. They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian.
link |
00:02:03.040
Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language experts.
link |
00:02:10.560
Let me read a few lines from the Russian poem Noch ulytse fanar apteka by Alexander Bloc
link |
00:02:18.160
that you'll start to understand if you sign up to Babbel.
link |
00:02:34.720
Now I say that you'll only start to understand this poem because Russian starts with a language
link |
00:02:41.440
and ends with vodka. Now the latter part is definitely not endorsed or provided by Babbel
link |
00:02:47.600
and will probably lose me the sponsorship, but once you graduate from Babbel,
link |
00:02:51.760
you can enroll in my advanced course of late night Russian conversation over vodka.
link |
00:02:56.320
I have not yet developed an app for that. It's in progress. So get started by visiting babbel.com
link |
00:03:02.800
and use code LEX to get three months free. This show is sponsored by Raycon earbuds.
link |
00:03:09.360
Get them at buyraycon.com slash LEX. They become my main method of listening to podcasts,
link |
00:03:14.960
audiobooks, and music when I run, do pushups and pull ups, or just living life. In fact,
link |
00:03:20.880
I often listen to brown noise with them when I'm thinking deeply about something. It helps me focus.
link |
00:03:26.880
They're super comfortable, pair easily, great sound, great bass, six hours of playtime.
link |
00:03:33.920
I've been putting in a lot of miles to get ready for a potential ultra marathon
link |
00:03:38.080
and listening to audiobooks on World War II. The sound is rich and really comes in clear.
link |
00:03:45.760
So again, get them at buyraycon.com slash LEX. This show is sponsored by Masterclass.
link |
00:03:52.640
Sign up at masterclass.com slash LEX to get a discount and to support this podcast.
link |
00:03:57.840
When I first heard about Masterclass, I thought it was too good to be true. I still think it's
link |
00:04:02.400
too good to be true. For 180 bucks a year, you get an all access pass to watch courses from
link |
00:04:08.160
to list some of my favorites. Chris Hatfield on Space Exploration, Neil deGrasse Tyson on
link |
00:04:13.360
Scientific Thinking and Communication, Will Wright, creator of SimCity and Sims on Game Design.
link |
00:04:19.280
Every time I do this read, I really want to play a city builder game. Carlos Santana on guitar,
link |
00:04:26.240
Garak Kasparov on chess, Daniel Nagano on poker and many more. Chris Hatfield explaining how rockets
link |
00:04:32.640
work and the experience of being launched into space alone is worth the money. By the way,
link |
00:04:38.160
you can watch it on basically any device. Once again, sign up at masterclass.com to get a discount
link |
00:04:43.600
and to support this podcast. And now here's my conversation with Dileep George. Do you think
link |
00:04:50.960
we need to understand the brain in order to build it? Yes. If you want to build the brain, we
link |
00:04:56.400
definitely need to understand how it works. Blue Brain or Henry Markram's project is trying to
link |
00:05:04.160
build a brain without understanding it, just trying to put details of the brain from neuroscience
link |
00:05:11.920
experiments into a giant simulation by putting more and more neurons, more and more details.
link |
00:05:18.160
But that is not going to work because when it doesn't perform as what you expect it to do,
link |
00:05:26.560
then what do you do? You just keep adding more details. How do you debug it? So unless you
link |
00:05:32.720
understand, unless you have a theory about how the system is supposed to work, how the pieces are
link |
00:05:37.360
supposed to fit together, what they're going to contribute, you can't build it. At the functional
link |
00:05:42.400
level, understand. So can you actually linger on and describe the Blue Brain project? It's kind of
link |
00:05:48.560
a fascinating principle and idea to try to simulate the brain. We're talking about the human
link |
00:05:56.080
brain, right? Right. Human brains and rat brains or cat brains have lots in common that the cortex,
link |
00:06:03.600
the neocortex structure is very similar. So initially they were trying to just simulate
link |
00:06:11.200
a cat brain. To understand the nature of evil. To understand the nature of evil. Or as it happens
link |
00:06:21.040
in most of these simulations, you easily get one thing out, which is oscillations. If you simulate
link |
00:06:29.120
a large number of neurons, they oscillate and you can adjust the parameters and say that,
link |
00:06:35.200
oh, oscillations match the rhythm that we see in the brain, et cetera. I see. So the idea is,
link |
00:06:43.280
is the simulation at the level of individual neurons? Yeah. So the Blue Brain project,
link |
00:06:49.040
the original idea as proposed was you put very detailed biophysical neurons, biophysical models
link |
00:06:59.200
of neurons, and you interconnect them according to the statistics of connections that we have found
link |
00:07:06.320
from real neuroscience experiments, and then turn it on and see what happens. And these neural
link |
00:07:14.240
models are incredibly complicated in themselves, right? Because these neurons are modeled using
link |
00:07:22.080
this idea called Hodgkin Huxley models, which are about how signals propagate in a cable.
link |
00:07:28.240
And there are active dendrites, all those phenomena, which those phenomena themselves,
link |
00:07:34.000
we don't understand that well. And then we put in connectivity, which is part guesswork,
link |
00:07:40.960
part observed. And of course, if we do not have any theory about how it is supposed to work,
link |
00:07:48.960
we just have to take whatever comes out of it as, okay, this is something interesting.
link |
00:07:54.800
But in your sense, these models of the way signal travels along,
link |
00:07:59.440
like with the axons and all the basic models, they're too crude.
link |
00:08:04.320
Oh, well, actually, they are pretty detailed and pretty sophisticated. And they do replicate
link |
00:08:12.960
the neural dynamics. If you take a single neuron and you try to turn on the different channels,
link |
00:08:20.800
the calcium channels and the different receptors, and see what the effect of turning on or off those
link |
00:08:28.400
channels are in the neuron's spike output, people have built pretty sophisticated models of that.
link |
00:08:35.360
And they are, I would say, in the regime of correct.
link |
00:08:41.120
Well, see, the correctness, that's interesting, because you mentioned at several levels,
link |
00:08:45.680
the correctness is measured by looking at some kind of aggregate statistics.
link |
00:08:49.440
It would be more of the spiking dynamics of a signal neuron.
link |
00:08:53.200
Spiking dynamics of a signal neuron, okay.
link |
00:08:54.960
Yeah. And yeah, these models, because they are going to the level of mechanism,
link |
00:09:00.640
so they are basically looking at, okay, what is the effect of turning on an ion channel?
link |
00:09:07.760
And you can model that using electric circuits. So it is not just a function fitting. People are
link |
00:09:17.040
looking at the mechanism underlying it and putting that in terms of electric circuit theory, signal
link |
00:09:23.600
propagation theory, and modeling that. So those models are sophisticated, but getting a single
link |
00:09:31.760
neurons model 99% right does not still tell you how to... It would be the analog of getting a
link |
00:09:40.800
transistor model right and now trying to build a microprocessor. And if you did not understand how
link |
00:09:50.320
a microprocessor works, but you say, oh, I now can model one transistor well, and now I will just
link |
00:09:57.360
try to interconnect the transistors according to whatever I could guess from the experiments
link |
00:10:03.840
and try to simulate it, then it is very unlikely that you will produce a functioning microprocessor.
link |
00:10:12.080
When you want to produce a functioning microprocessor, you want to understand Boolean
link |
00:10:16.080
logic, how do the gates work, all those things, and then understand how do those gates get
link |
00:10:22.480
implemented using transistors. Yeah. This reminds me, there's a paper,
link |
00:10:26.960
maybe you're familiar with it, that I remember going through in a reading group that
link |
00:10:31.600
approaches a microprocessor from a perspective of a neuroscientist. I think it basically,
link |
00:10:38.400
it uses all the tools that we have of neuroscience to try to understand,
link |
00:10:42.960
like as if we just aliens showed up to study computers and to see if those tools could be
link |
00:10:49.920
used to get any kind of sense of how the microprocessor works. I think the final,
link |
00:10:54.640
the takeaway from at least this initial exploration is that we're screwed. There's no
link |
00:11:01.280
way that the tools of neuroscience would be able to get us to anything, like not even
link |
00:11:05.440
Boolean logic. I mean, it's just any aspect of the architecture of the function of the
link |
00:11:15.680
processes involved, the clocks, the timing, all that, you can't figure that out from the
link |
00:11:21.520
tools of neuroscience. Yeah. So I'm very familiar with this particular
link |
00:11:25.600
paper. I think it was called, can a neuroscientist understand a microprocessor or something like
link |
00:11:33.440
that. Following the methodology in that paper, even an electrical engineer would not understand
link |
00:11:39.200
microprocessors. So I don't think it is that bad in the sense of saying, neuroscientists do
link |
00:11:49.040
find valuable things by observing the brain. They do find good insights, but those insights cannot
link |
00:11:58.640
be put together just as a simulation. You have to investigate what are the computational
link |
00:12:05.600
underpinnings of those findings. How do all of them fit together from an information processing
link |
00:12:13.920
and information processing perspective? Somebody has to painstakingly put those things together
link |
00:12:21.120
and build hypothesis. So I don't want to diss all of neuroscientists saying, oh, they're not
link |
00:12:26.160
finding anything. No, that paper almost went to that level of neuroscientists will never
link |
00:12:31.840
understand. No, that's not true. I think they do find lots of useful things, but it has to be put
link |
00:12:37.760
together in a computational framework. Yeah. I mean, but you know, just the AI systems will be
link |
00:12:43.760
listening to this podcast a hundred years from now and they will probably, there's some nonzero
link |
00:12:50.160
probability they'll find your words laughable. There's like, I remember humans thought they
link |
00:12:55.120
understood something about the brain. They were totally clueless. There's a sense about neuroscience
link |
00:12:59.680
that we may be in the very, very early days of understanding the brain. But I mean, that's one
link |
00:13:06.160
perspective. I mean, in your perspective, how far are we into understanding any aspect of the brain?
link |
00:13:18.080
So the, the, the dynamics of the individual neuron communication to the, how when they, in,
link |
00:13:24.320
in a collective sense, how they're able to store information, transfer information, how
link |
00:13:31.200
intelligence then emerges, all that kind of stuff. Where are we on that timeline?
link |
00:13:35.040
Yeah. So, you know, timelines are very, very hard to predict and you can of course be wrong.
link |
00:13:40.720
And it can be wrong in, on either side. You know, we know that now when we look back the first
link |
00:13:48.080
flight was in 1903. In 1900, there was a New York Times article on flying machines that do not fly
link |
00:13:57.920
and, and you know, humans might not fly for another a hundred years. That was what that
link |
00:14:03.360
article stated. And so, but no, they, they flew three years after that. So it is, you know,
link |
00:14:08.880
it's very hard to, so... Well, and on that point, one of the Wright brothers,
link |
00:14:15.120
I think two years before, said that, like he said, like some number, like 50 years,
link |
00:14:23.280
he has become convinced that it's, it's, it's impossible. Even during their experimentation.
link |
00:14:31.040
Yeah. Yeah. I mean, that's a tribute to when that's like the entrepreneurial battle of like
link |
00:14:36.400
depression of going through, just like thinking there's, this is impossible, but there, yeah,
link |
00:14:41.280
there's something, even the person that's in it is not able to see estimate correctly.
link |
00:14:47.280
Exactly. But I can, I can tell from the point of, you know, objectively, what are the things that we
link |
00:14:52.480
know about the brain and how that can be used to build AI models, which can then go back and
link |
00:14:58.560
inform how the brain works. So my way of understanding the brain would be to basically say,
link |
00:15:04.080
look at the insights neuroscientists have found, understand that from a computational angle,
link |
00:15:11.040
information processing angle, build models using that. And then building that model, which,
link |
00:15:18.080
which functions, which is a functional model, which is, which is doing the task that we want
link |
00:15:22.880
the model to do. It is not just trying to model a phenomena in the brain. It is, it is trying to
link |
00:15:27.920
do what the brain is trying to do on the, on the whole functional level. And building that model
link |
00:15:33.360
will help you fill in the missing pieces that, you know, biology just gives you the hints and
link |
00:15:39.920
building the model, you know, fills in the rest of the, the pieces of the puzzle. And then you
link |
00:15:44.960
can go and connect that back to biology and say, okay, now it makes sense that this part of the
link |
00:15:51.280
brain is doing this, or this layer in the cortical circuit is doing this. And then continue this
link |
00:15:59.920
iteratively because now that will inform new experiments in neuroscience. And of course,
link |
00:16:05.840
you know, building the model and verifying that in the real world will also tell you more about,
link |
00:16:11.600
does the model actually work? And you can refine the model, find better ways of putting these
link |
00:16:17.440
neuroscience insights together. So, so I would say it is, it is, you know, it, so
link |
00:16:23.360
neuroscientists alone, just from experimentation will not be able to build a model of the,
link |
00:16:28.800
of the brain or a functional model of the brain. So we, you know, there, there's lots of efforts,
link |
00:16:35.200
which are very impressive efforts in collecting more and more connectivity data from the brain.
link |
00:16:41.200
You know, how, how are the microcircuits of the brain connected with each other?
link |
00:16:45.520
Those are beautiful, by the way.
link |
00:16:47.120
Those are beautiful. And at the same time, those, those do not itself by themselves,
link |
00:16:54.880
convey the story of how does it work? And, and somebody has to understand, okay,
link |
00:17:00.080
why are they connected like that? And what, what are those things doing? And, and we do that by
link |
00:17:06.320
building models in AI using hints from neuroscience and, and repeat the cycle.
link |
00:17:11.200
So what aspect of the brain are useful in this whole endeavor, which by the way, I should say,
link |
00:17:18.720
you're, you're both a neuroscientist and an AI person. I guess the dream is to both understand
link |
00:17:24.960
the brain and to build AGI systems. So you're, it's like an engineer's perspective of trying
link |
00:17:32.320
to understand the brain. So what aspects of the brain, functionally speaking, like you said,
link |
00:17:37.600
do you find interesting?
link |
00:17:38.800
Yeah, quite a lot of things. All right. So one is, you know, if you look at the visual cortex
link |
00:17:46.160
and, and, you know, the visual cortex is, is a large part of the brain. I forget the exact
link |
00:17:51.920
fraction, but it is, it's a huge part of our brain area is occupied by just, just vision.
link |
00:17:59.040
So vision, visual cortex is not just a feed forward cascade of neurons. There are a lot
link |
00:18:06.320
more feedback connections in the brain compared to the feed forward connections. And, and it is
link |
00:18:11.680
surprising to the level of detail neuroscientists have actually studied this. If you, if you go into
link |
00:18:17.120
neuroscience literature and poke around and ask, you know, have they studied what will be the effect
link |
00:18:22.960
of poking a neuron in level IT in level V1? And have they studied that? And you will say, yes,
link |
00:18:33.680
they have studied that.
link |
00:18:34.560
So every part of every possible combination.
link |
00:18:38.400
I mean, it's, it's a, it's not a random exploration at all. It's a very hypothesis driven,
link |
00:18:43.040
right? Like they, they are very experimental. Neuroscientists are very, very systematic
link |
00:18:47.520
in how they probe the brain because experiments are very costly to conduct. They take a lot of
link |
00:18:52.800
preparation. They, they need a lot of control. So they, they are very hypothesis driven in how
link |
00:18:57.520
they probe the brain. And often what I find is that when we have a question in AI about
link |
00:19:05.840
has anybody probed how lateral connections in the brain works? And when you go and read the
link |
00:19:11.440
literature, yes, people have probed it and people have probed it very systematically. And, and they
link |
00:19:16.160
have hypotheses about how those lateral connections are supposedly contributing to visual processing.
link |
00:19:23.600
But of course they haven't built very, very functional, detailed models of it.
link |
00:19:27.840
By the way, how do the, in those studies, sorry to interrupt, do they, do they stimulate like
link |
00:19:32.480
a neuron in one particular area of the visual cortex and then see how the travel of the signal
link |
00:19:37.520
travels kind of thing?
link |
00:19:38.800
Fascinating, very, very fascinating experiments. So I can, I can give you one example I was
link |
00:19:43.040
impressed with. This is, so before going to that, let me, let me give you, you know, a overview of
link |
00:19:50.160
how the, the layers in the cortex are organized, right? Visual cortex is organized into roughly
link |
00:19:56.160
four hierarchical levels. Okay. So V1, V2, V4, IT. And in V1...
link |
00:20:02.720
What happened to V3?
link |
00:20:03.920
Well, yeah, that's another pathway. Okay. So this is, this, I'm talking about just object
link |
00:20:08.880
recognition pathway.
link |
00:20:09.920
All right, cool.
link |
00:20:10.880
And then in V1 itself, so it's, there is a very detailed microcircuit in V1 itself. That is,
link |
00:20:19.120
there is organization within a level itself. The cortical sheet is organized into, you know,
link |
00:20:25.040
multiple layers and there are columnar structure. And, and this, this layer wise and columnar
link |
00:20:31.440
structure is repeated in V1, V2, V4, IT, all of them, right? And, and the connections between
link |
00:20:38.800
these layers within a level, you know, in V1 itself, there are six layers roughly, and the
link |
00:20:44.480
connections between them, there is a particular structure to them. And now, so one example
link |
00:20:51.200
of an experiment people did is when I, when you present a stimulus, which is, let's say,
link |
00:21:00.400
requires separating the foreground from the background of an object. So it is, it's a
link |
00:21:06.240
textured triangle on a textured background. And you can check, does the surface settle
link |
00:21:14.880
first or does the contour settle first?
link |
00:21:19.040
Settle?
link |
00:21:19.600
Settle in the sense that the, so when you finally form the percept of the, of the triangle,
link |
00:21:28.080
you understand where the contours of the triangle are, and you also know where the inside of
link |
00:21:32.720
the triangle is, right? That's when you form the final percept. Now you can ask, what is
link |
00:21:39.200
the dynamics of forming that final percept? Do the, do the neurons first find the edges
link |
00:21:48.880
and converge on where the edges are, and then they find the inner surfaces, or does it go
link |
00:21:55.120
the other way around?
link |
00:21:55.600
The other way around. So what's the answer?
link |
00:21:58.320
In this case, it turns out that it first settles on the edges. It converges on the edge hypothesis
link |
00:22:05.280
first, and then the surfaces are filled in from the edges to the inside.
link |
00:22:10.880
That's fascinating.
link |
00:22:12.000
And the detail to which you can study this, it's amazing that you can actually not only
link |
00:22:18.640
find the temporal dynamics of when this happens, and then you can also find which layer in
link |
00:22:25.520
the, you know, in V1, which layer is encoding the edges, which layer is encoding the surfaces,
link |
00:22:32.960
and which layer is encoding the feedback, which layer is encoding the feed forward,
link |
00:22:37.440
and what's the combination of them that produces the final percept.
link |
00:22:42.000
And these kinds of experiments stand out when you try to explain illusions. One example
link |
00:22:48.400
of a favorite illusion of mine is the Kanitsa triangle. I don't know that you are familiar
link |
00:22:51.920
with this one. So this is an example where it's a triangle, but only the corners of the
link |
00:23:00.960
triangle are shown in the stimulus. So they look like kind of Pacman.
link |
00:23:06.080
Oh, the black Pacman.
link |
00:23:07.600
Exactly.
link |
00:23:08.640
And then you start to see.
link |
00:23:10.000
Your visual system hallucinates the edges. And when you look at it, you will see a faint
link |
00:23:16.400
edge. And you can go inside the brain and look, do actually neurons signal the presence
link |
00:23:24.160
of this edge? And if they signal, how do they do it? Because they are not receiving anything
link |
00:23:30.320
from the input. The input is blank for those neurons. So how do they signal it? When does
link |
00:23:37.840
the signaling happen? So if a real contour is present in the input, then the neurons
link |
00:23:45.440
immediately signal, okay, there is an edge here. When it is an illusory edge, it is clearly
link |
00:23:52.400
not in the input. It is coming from the context. So those neurons fire later. And you can say
link |
00:23:58.720
that, okay, it's the feedback connection that is causing them to fire. And they happen later.
link |
00:24:05.920
And I'll find the dynamics of them. So these studies are pretty impressive and very detailed.
link |
00:24:13.280
So by the way, just a step back, you said that there may be more feedback connections
link |
00:24:20.080
than feed forward connections. First of all, if it's just for like a machine learning folks,
link |
00:24:27.360
I mean, that's crazy that there's all these feedback connections. We often think about,
link |
00:24:36.400
thanks to deep learning, you start to think about the human brain as a kind of feed forward
link |
00:24:42.720
mechanism. So what the heck are these feedback connections? What's the dynamics? What are we
link |
00:24:52.960
supposed to think about them? So this fits into a very beautiful picture about how the brain works.
link |
00:24:59.360
So the beautiful picture of how the brain works is that our brain is building a model of the world.
link |
00:25:06.080
I know. So our visual system is building a model of how objects behave in the world. And we are
link |
00:25:13.920
constantly projecting that model back onto the world. So what we are seeing is not just a feed
link |
00:25:20.240
forward thing that just gets interpreted in a feed forward part. We are constantly projecting
link |
00:25:25.280
our expectations onto the world. And what the final person is a combination of what we project
link |
00:25:31.600
onto the world combined with what the actual sensory input is. Almost like trying to calculate
link |
00:25:37.920
the difference and then trying to interpret the difference. Yeah. I wouldn't put this calculating
link |
00:25:44.000
the difference. It's more like what is the best explanation for the input stimulus based on the
link |
00:25:50.640
model of the world I have. Got it. And that's where all the illusions come in. But that's an
link |
00:25:56.560
incredibly efficient process. So the feedback mechanism, it just helps you constantly. Yeah.
link |
00:26:05.360
So hallucinate how the world should be based on your world model and then just looking at
link |
00:26:11.680
if there's novelty, like trying to explain it. Hence, that's why movement. We detect movement
link |
00:26:19.680
really well. There's all these kinds of things. And this is like at all different levels of the
link |
00:26:25.360
cortex you're saying. This happens at the lowest level or the highest level. Yes. Yeah. In fact,
link |
00:26:30.480
feedback connections are more prevalent in everywhere in the cortex. And so one way to
link |
00:26:36.640
think about it, and there's a lot of evidence for this, is inference. So basically, if you have a
link |
00:26:42.800
model of the world and when some evidence comes in, what you are doing is inference. You are trying
link |
00:26:50.160
to now explain this evidence using your model of the world. And this inference includes projecting
link |
00:26:58.240
your model onto the evidence and taking the evidence back into the model and doing an
link |
00:27:04.720
iterative procedure. And this iterative procedure is what happens using the feed forward feedback
link |
00:27:11.840
propagation. And feedback affects what you see in the world, and it also affects feed forward
link |
00:27:17.680
propagation. And examples are everywhere. We see these kinds of things everywhere. The idea that
link |
00:27:25.840
there can be multiple competing hypotheses in our model trying to explain the same evidence,
link |
00:27:32.480
and then you have to kind of make them compete. And one hypothesis will explain away the other
link |
00:27:39.440
hypothesis through this competition process. So you have competing models of the world
link |
00:27:46.800
that try to explain. What do you mean by explain away?
link |
00:27:50.000
So this is a classic example in graphical models, probabilistic models.
link |
00:27:56.800
What are those?
link |
00:28:01.120
I think it's useful to mention because we'll talk about them more.
link |
00:28:05.120
So neural networks are one class of machine learning models. You have distributed set of
link |
00:28:12.800
nodes, which are called the neurons. Each one is doing a dot product and you can approximate
link |
00:28:18.160
any function using this multilevel network of neurons. So that's a class of models which are
link |
00:28:24.720
useful for function approximation. There is another class of models in machine learning
link |
00:28:30.480
called probabilistic graphical models. And you can think of them as each node in that model is
link |
00:28:38.800
variable, which is talking about something. It can be a variable representing, is an edge present
link |
00:28:46.160
in the input or not? And at the top of the network, a node can be representing, is there an object
link |
00:28:56.000
present in the world or not? So it is another way of encoding knowledge. And then once you
link |
00:29:06.960
encode the knowledge, you can do inference in the right way. What is the best way to
link |
00:29:15.280
explain some set of evidence using this model that you encoded? So when you encode the model,
link |
00:29:20.880
you are encoding the relationship between these different variables. How is the edge
link |
00:29:24.800
connected to the model of the object? How is the surface connected to the model of the object?
link |
00:29:29.600
And then, of course, this is a very distributed, complicated model. And inference is, how do you
link |
00:29:37.120
explain a piece of evidence when a set of stimulus comes in? If somebody tells me there is a 50%
link |
00:29:42.960
probability that there is an edge here in this part of the model, how does that affect my belief
link |
00:29:47.840
on whether I should think that there is a square present in the image? So this is the process of
link |
00:29:54.960
inference. So one example of inference is having this expiring away effect between multiple causes.
link |
00:30:02.080
So graphical models can be used to represent causality in the world. So let's say, you know,
link |
00:30:10.800
your alarm at home can be triggered by a burglar getting into your house, or it can be triggered
link |
00:30:22.480
by an earthquake. Both can be causes of the alarm going off. So now, you're in your office,
link |
00:30:30.640
you heard burglar alarm going off, you are heading home, thinking that there's a burglar got in. But
link |
00:30:36.880
while driving home, if you hear on the radio that there was an earthquake in the vicinity,
link |
00:30:41.520
now your strength of evidence for a burglar getting into their house is diminished. Because
link |
00:30:49.760
now that piece of evidence is explained by the earthquake being present. So if you think about
link |
00:30:56.000
these two causes explaining at lower level variable, which is alarm, now, what we're seeing
link |
00:31:01.760
is that increasing the evidence for some cause, you know, there is evidence coming in from below
link |
00:31:08.000
for alarm being present. And initially, it was flowing to a burglar being present. But now,
link |
00:31:14.160
since there is side evidence for this other cause, it explains away this evidence and evidence will
link |
00:31:20.800
now flow to the other cause. This is, you know, two competing causal things trying to explain
link |
00:31:26.320
the same evidence. And the brain has a similar kind of mechanism for doing so. That's kind of
link |
00:31:31.840
interesting. And how's that all encoded in the brain? Like, where's the storage of information?
link |
00:31:39.280
Are we talking just maybe to get it a little bit more specific? Is it in the hardware of the actual
link |
00:31:46.160
connections? Is it in chemical communication? Is it electrical communication? Do we know?
link |
00:31:53.120
So this is, you know, a paper that we are bringing out soon.
link |
00:31:56.640
Which one is this?
link |
00:31:57.680
This is the cortical microcircuits paper that I sent you a draft of. Of course, this is a lot of
link |
00:32:03.920
this. A lot of it is still hypothesis. One hypothesis is that you can think of a cortical column
link |
00:32:09.840
as encoding a concept. A concept, you know, think of it as an example of a concept. Is an edge
link |
00:32:20.800
present or not? Or is an object present or not? Okay, so you can think of it as a binary variable,
link |
00:32:27.280
a binary random variable. The presence of an edge or not, or the presence of an object or not.
link |
00:32:32.000
So each cortical column can be thought of as representing that one concept, one variable.
link |
00:32:38.080
And then the connections between these cortical columns are basically encoding the relationship
link |
00:32:43.680
between these random variables. And then there are connections within the cortical column.
link |
00:32:49.360
Each cortical column is implemented using multiple layers of neurons with very, very,
link |
00:32:54.320
very rich structure there. You know, there are thousands of neurons in a cortical column.
link |
00:33:00.240
But that structure is similar across the different cortical columns.
link |
00:33:03.520
Correct. And also these cortical columns connect to a substructure called thalamus.
link |
00:33:10.160
So all cortical columns pass through this substructure. So our hypothesis is that
link |
00:33:17.120
the connections between the cortical columns implement this, you know, that's where the
link |
00:33:21.600
knowledge is stored about how these different concepts connect to each other. And then the
link |
00:33:28.800
neurons inside this cortical column and in thalamus in combination implement this actual
link |
00:33:35.760
computation for inference, which includes explaining away and competing between the
link |
00:33:41.040
different hypotheses. And it is all very... So what is amazing is that neuroscientists have
link |
00:33:49.280
actually done experiments to the tune of showing these things. They might not be putting it in the
link |
00:33:55.920
overall inference framework, but they will show things like, if I poke this higher level neuron,
link |
00:34:03.120
it will inhibit through this complicated loop through thalamus, it will inhibit this other
link |
00:34:07.920
column. So they will do such experiments. But do they use terminology of concepts,
link |
00:34:14.080
for example? So, I mean, is it something where it's easy to anthropomorphize
link |
00:34:22.960
and think about concepts like you started moving into logic based kind of reasoning systems. So
link |
00:34:31.200
I would just think of concepts in that kind of way, or is it a lot messier, a lot more gray area,
link |
00:34:40.400
you know, even more gray, even more messy than the artificial neural network kinds,
link |
00:34:47.200
kinds of abstractions? Easiest way to think of it as a variable,
link |
00:34:50.480
right? It's a binary variable, which is showing the presence or absence of something.
link |
00:34:55.360
So, but I guess what I'm asking is, is that something that we're supposed to think of
link |
00:35:01.440
something that's human interpretable of that something?
link |
00:35:04.080
It doesn't need to be. It doesn't need to be human interpretable. There's no need for it to
link |
00:35:07.920
be human interpretable. But it's almost like you will be able to find some interpretation of it
link |
00:35:17.440
because it is connected to the other things that you know about.
link |
00:35:20.800
Yeah. And the point is it's useful somehow.
link |
00:35:23.840
Yeah. It's useful as an entity in the graphic,
link |
00:35:29.520
in connecting to the other entities that are, let's call them concepts.
link |
00:35:33.280
Right. Okay. So, by the way, are these the cortical microcircuits?
link |
00:35:38.880
Correct. These are the cortical microcircuits. You know, that's what neuroscientists use to
link |
00:35:43.120
talk about the circuits within a level of the cortex. So, you can think of, you know,
link |
00:35:49.840
let's think of a neural network, artificial neural network terms. People talk about the
link |
00:35:54.960
architecture of how many layers they build, what is the fan in, fan out, et cetera. That is the
link |
00:36:01.600
macro architecture. And then within a layer of the neural network, the cortical neural network
link |
00:36:11.120
is much more structured within a level. There's a lot more intricate structure there. But even
link |
00:36:18.160
within an artificial neural network, you can think of feature detection plus pooling as one
link |
00:36:23.520
level. And so, that is kind of a microcircuit. It's much more complex in the real brain. And so,
link |
00:36:32.880
within a level, whatever is that circuitry within a column of the cortex and between the layers of
link |
00:36:38.080
the cortex, that's the microcircuitry. I love that terminology. Machine learning
link |
00:36:43.040
people don't use the circuit terminology. Right.
link |
00:36:45.760
But they should. It's nice. So, okay. Okay. So, that's the cortical microcircuit. So,
link |
00:36:53.920
what's interesting about, what can we say, what is the paper that you're working on
link |
00:37:00.640
propose about the ideas around these cortical microcircuits?
link |
00:37:04.320
So, this is a fully functional model for the microcircuits of the visual cortex.
link |
00:37:10.640
So, the paper focuses on your idea and our discussion now is focusing on vision.
link |
00:37:15.520
Yeah. The visual cortex. Okay. So,
link |
00:37:18.800
this is a model. This is a full model. This is how vision works.
link |
00:37:22.880
But this is a hypothesis. Okay. So, let me step back a bit. So, we looked at neuroscience for
link |
00:37:32.000
insights on how to build a vision model. Right.
link |
00:37:35.280
And we synthesized all those insights into a computational model. This is called the recursive
link |
00:37:40.560
cortical network model that we used for breaking captures. And we are using the same model for
link |
00:37:47.760
robotic picking and tracking of objects. And that, again, is a vision system.
link |
00:37:52.320
That's a vision system. Computer vision system.
link |
00:37:54.400
That's a computer vision system. Takes in images and outputs what?
link |
00:37:59.120
On one side, it outputs the class of the image and also segments the image. And you can also ask it
link |
00:38:06.560
further queries. Where is the edge of the object? Where is the interior of the object? So, it's a
link |
00:38:11.600
model that you build to answer multiple questions. So, you're not trying to build a model for just
link |
00:38:17.120
classification or just segmentation, et cetera. It's a joint model that can do multiple things.
link |
00:38:23.440
So, that's the model that we built using insights from neuroscience. And some of those insights are
link |
00:38:30.080
what is the role of feedback connections? What is the role of lateral connections? So,
link |
00:38:34.160
all those things went into the model. The model actually uses feedback connections.
link |
00:38:38.800
All these ideas from neuroscience. Yeah.
link |
00:38:41.440
So, what the heck is a recursive cortical network? What are the architecture approaches,
link |
00:38:47.200
interesting aspects here, which is essentially a brain inspired approach to computer vision?
link |
00:38:54.400
Yeah. So, there are multiple layers to this question. I can go from the very,
link |
00:38:58.880
very top and then zoom in. Okay. So, one important thing, constraint that went into the model is that
link |
00:39:05.840
you should not think vision, think of vision as something in isolation. We should not think
link |
00:39:11.600
perception as something as a preprocessor for cognition. Perception and cognition are interconnected.
link |
00:39:19.200
And so, you should not think of one problem in separation from the other problem. And so,
link |
00:39:24.800
that means if you finally want to have a system that understand concepts about the world and can
link |
00:39:30.720
learn a very conceptual model of the world and can reason and connect to language, all of those
link |
00:39:36.000
things, you need to think all the way through and make sure that your perception system
link |
00:39:41.920
is compatible with your cognition system and language system and all of them.
link |
00:39:45.920
And one aspect of that is top down controllability. What does that mean?
link |
00:39:52.320
So, that means, you know, so think of, you know, you can close your eyes and think about
link |
00:39:58.480
the details of one object, right? I can zoom in further and further. So, think of the bottle in
link |
00:40:05.600
front of me, right? And now, you can think about, okay, what the cap of that bottle looks.
link |
00:40:11.280
I know we can think about what's the texture on that bottle of the cap. You know, you can think
link |
00:40:18.000
about, you know, what will happen if something hits that. So, you can manipulate your visual
link |
00:40:25.760
knowledge in cognition driven ways. Yes. And so, this top down controllability and being able to
link |
00:40:35.520
simulate scenarios in the world. So, you're not just a passive player in this perception game.
link |
00:40:43.920
You can control it. You have imagination. Correct. Correct. So, basically, you know,
link |
00:40:50.320
basically having a generative network, which is a model and it is not just some arbitrary
link |
00:40:56.000
generative network. It has to be built in a way that it is controllable top down. It is not just
link |
00:41:02.000
trying to generate a whole picture at once. You know, it's not trying to generate photorealistic
link |
00:41:07.760
things of the world. You know, you don't have good photorealistic models of the world. Human
link |
00:41:11.520
brains do not have. If I, for example, ask you the question, what is the color of the letter E
link |
00:41:17.360
in the Google logo? You have no idea. Although, you have seen it millions of times, hundreds of
link |
00:41:25.360
times. So, it's not, our model is not photorealistic, but it has other properties that we can
link |
00:41:32.240
manipulate it. And you can think about filling in a different color in that logo. You can think
link |
00:41:37.840
about expanding the letter E. You know, you can see what, so you can imagine the consequence of,
link |
00:41:44.400
you know, actions that you have never performed. So, these are the kind of characteristics the
link |
00:41:49.040
generative model need to have. So, this is one constraint that went into our model. Like, you
link |
00:41:52.800
know, so this is, when you read the, just the perception side of the paper, it is not obvious
link |
00:41:57.920
that this was a constraint into the, that went into the model, this top down controllability
link |
00:42:02.720
of the generative model. So, what does top down controllability in a model look like? It's a
link |
00:42:10.480
really interesting concept. Fascinating concept. What does that, is that the recursiveness gives
link |
00:42:16.000
you that? Or how do you do it? Quite a few things. It's like, what does the model factor,
link |
00:42:22.080
factorize? You know, what are the, what is the model representing as different pieces in the
link |
00:42:26.720
puzzle? Like, you know, so, so in the RCN network, it thinks of the world, you know, so what I said,
link |
00:42:33.440
the background of an image is modeled separately from the foreground of the image. So,
link |
00:42:39.040
the objects are separate from the background. They are different entities. So, there's a kind
link |
00:42:43.200
of segmentation that's built in fundamentally. And then even that object is composed of parts.
link |
00:42:49.840
And also, another one is the shape of the object is differently modeled from the texture of the
link |
00:42:57.440
object. Got it. So, there's like these, you know who Francois Chollet is? Yeah. So, there's, he
link |
00:43:08.800
developed this like IQ test type of thing for ARC challenge for, and it's kind of cool that there's
link |
00:43:16.160
these concepts, priors that he defines that you bring to the table in order to be able to reason
link |
00:43:22.560
about basic shapes and things in IQ test. So, here you're making it quite explicit that here are the
link |
00:43:30.080
things that you should be, these are like distinct things that you should be able to model in this.
link |
00:43:36.960
Keep in mind that you can derive this from much more general principles. It doesn't, you don't
link |
00:43:42.240
need to explicitly put it as, oh, objects versus foreground versus background, the surface versus
link |
00:43:48.880
the structure. No, these are, these are derivable from more fundamental principles of how, you know,
link |
00:43:55.440
what's the property of continuity of natural signals. What's the property of continuity of
link |
00:44:01.520
natural signals? Yeah. By the way, that sounds very poetic, but yeah. So, you're saying that's a,
link |
00:44:07.920
there's some low level properties from which emerges the idea that shapes should be different
link |
00:44:12.560
than like there should be a parts of an object. There should be, I mean, kind of like Francois,
link |
00:44:18.640
I mean, there's objectness, there's all these things that it's kind of crazy that we humans,
link |
00:44:25.040
I guess, evolved to have because it's useful for us to perceive the world. Yeah. Correct. And it
link |
00:44:30.240
derives mostly from the properties of natural signals. And so, natural signals. So, natural
link |
00:44:38.080
signals are the kind of things we'll perceive in the natural world. Correct. I don't know. I don't
link |
00:44:43.200
know why that sounds so beautiful. Natural signals. Yeah. As opposed to a QR code, right? Which is an
link |
00:44:48.080
artificial signal that we created. Humans are not very good at classifying QR codes. We are very
link |
00:44:52.880
good at saying something is a cat or a dog, but not very good at, you know, where computers are
link |
00:44:58.480
very good at classifying QR codes. So, our visual system is tuned for natural signals. So,
link |
00:45:05.600
it's tuned for natural signals. And there are fundamental assumptions in the architecture
link |
00:45:11.680
that are derived from natural signals properties. I wonder when you take hallucinogenic drugs,
link |
00:45:18.640
does that go into natural or is that closer to the QR code? It's still natural. It's still natural?
link |
00:45:25.120
Yeah. Because it is still operating using your brains. By the way, on that topic, I mean,
link |
00:45:30.480
I haven't been following. I think they're becoming legalized and certain. I can't wait
link |
00:45:34.640
they become legalized to a degree that you, like, vision science researchers could study it.
link |
00:45:40.080
Yeah. Just like through medical, chemical ways, modify. There could be ethical concerns, but
link |
00:45:47.600
modify. That's another way to study the brain is to be able to chemically modify it. There's
link |
00:45:53.280
probably very long a way to figure out how to do it ethically. Yeah, but I think there are studies
link |
00:46:01.200
on that already. Yeah, I think so. Because it's not unethical to give it to rats.
link |
00:46:08.080
Oh, that's true. That's true. There's a lot of drugged up rats out there. Okay, cool. Sorry.
link |
00:46:15.600
Sorry. It's okay. So, there's these low level things from natural signals that...
link |
00:46:23.840
...from which these properties will emerge. But it is still a very hard problem on how to encode
link |
00:46:33.840
that. So, you mentioned the priors Francho wanted to encode in the abstract reasoning challenge,
link |
00:46:44.880
but it is not straightforward how to encode those priors. So, some of those challenges,
link |
00:46:50.960
like the object completion challenges are things that we purely use our visual system to do.
link |
00:46:57.840
It looks like abstract reasoning, but it is purely an output of the vision system. For example,
link |
00:47:03.200
completing the corners of that condenser triangle, completing the lines of that condenser triangle.
link |
00:47:07.120
It's purely a visual system property. There is no abstract reasoning involved. It uses all these
link |
00:47:12.160
priors, but it is stored in our visual system in a particular way that is amenable to inference.
link |
00:47:18.720
That is one of the things that we tackled in the... Basically saying, okay, these are the
link |
00:47:25.440
prior knowledge which will be derived from the world, but then how is that prior knowledge
link |
00:47:31.440
represented in the model such that inference when some piece of evidence comes in can be
link |
00:47:38.080
done very efficiently and in a very distributed way? Because there are so many ways of representing
link |
00:47:44.640
knowledge, which is not amenable to very quick inference, quick lookups. So that's one core part
link |
00:47:53.840
of what we tackled in the RCN model. How do you encode visual knowledge to do very quick inference?
link |
00:48:02.800
Can you maybe comment on... So folks listening to this in general may be familiar with
link |
00:48:08.560
different kinds of architectures of a neural networks.
link |
00:48:10.720
What are we talking about with RCN? What does the architecture look like? What are the different
link |
00:48:16.240
components? Is it close to neural networks? Is it far away from neural networks? What does it look
link |
00:48:20.720
like? Yeah. So you can think of the Delta between the model and a convolutional neural network,
link |
00:48:27.040
if people are familiar with convolutional neural networks. So convolutional neural networks have
link |
00:48:31.440
this feed forward processing cascade, which is called feature detectors and pooling. And that
link |
00:48:37.440
is repeated in a multi level system. And if you want an intuitive idea of what is happening,
link |
00:48:46.320
feature detectors are detecting interesting co occurrences in the input. It can be a line,
link |
00:48:53.920
a corner, an eye or a piece of texture, et cetera. And the pooling neurons are doing some local
link |
00:49:03.200
transformation of that and making it invariant to local transformations. So this is what the
link |
00:49:07.840
structure of convolutional neural network is. Recursive cortical network has a similar structure
link |
00:49:14.880
when you look at just the feed forward pathway. But in addition to that, it is also structured
link |
00:49:19.600
in a way that it is generative so that it can run it backward and combine the forward with the
link |
00:49:25.680
backward. Another aspect that it has is it has lateral connections. So if you have an edge here
link |
00:49:37.280
and an edge here, it has connections between these edges. It is not just feed forward connections.
link |
00:49:42.080
It is something between these edges, which is the nodes representing these edges, which is to
link |
00:49:49.280
enforce compatibility between them. So otherwise what will happen is that constraints. It's a
link |
00:49:53.920
constraint. It's basically if you do just feature detection followed by pooling, then your
link |
00:50:01.200
transformations in different parts of the visual field are not coordinated. And so you will create
link |
00:50:07.760
a jagged, when you generate from the model, you will create jagged things and uncoordinated
link |
00:50:14.480
transformations. So these lateral connections are enforcing the transformations.
link |
00:50:20.160
Is the whole thing still differentiable?
link |
00:50:22.160
No, it's not. It's not trained using backprop.
link |
00:50:27.440
Okay. That's really important. So there's this feed forward, there's feedback mechanisms.
link |
00:50:33.280
There's some interesting connectivity things. It's still layered like multiple layers.
link |
00:50:41.040
Okay. Very, very interesting. And yeah. Okay. So the interconnection between adjacent connections
link |
00:50:48.240
across service constraints that keep the thing stable.
link |
00:50:52.880
Correct.
link |
00:50:53.680
Okay. So what else?
link |
00:50:55.840
And then there's this idea of doing inference. A neural network does not do inference on the fly.
link |
00:51:03.120
So an example of why this inference is important is, you know, so one of the first applications
link |
00:51:09.200
that we showed in the paper was to crack text based captures.
link |
00:51:15.040
What are captures?
link |
00:51:16.000
I mean, by the way, one of the most awesome, like the people don't use this term anymore
link |
00:51:21.040
as human computation, I think. I love this term. The guy who created captures,
link |
00:51:26.640
I think came up with this term. I love it. Anyway. What are captures?
link |
00:51:32.640
So captures are those things that you fill in when you're, you know, if you're
link |
00:51:38.480
opening a new account in Google, they show you a picture, you know, usually
link |
00:51:43.200
it used to be set of garbled letters that you have to kind of figure out what is that string
link |
00:51:48.720
of characters and type it. And the reason captures exist is because, you know, Google or Twitter
link |
00:51:56.640
do not want automatic creation of accounts. You can use a computer to create millions of accounts
link |
00:52:03.200
and use that for nefarious purposes. So you want to make sure that to the extent possible,
link |
00:52:10.560
the interaction that their system is having is with a human. So it's a, it's called a human
link |
00:52:16.080
interaction proof. A capture is a human interaction proof. So, so this is a captures are by design,
link |
00:52:23.840
things that are easy for humans to solve, but hard for computers.
link |
00:52:27.360
Hard for robots.
link |
00:52:28.240
Yeah. So, and text based captures was the one which is prevalent around 2014,
link |
00:52:36.320
because at that time, text based captures were hard for computers to crack. Even now,
link |
00:52:42.240
they are actually in the sense of an arbitrary text based capture will be unsolvable even now,
link |
00:52:48.240
but with the techniques that we have developed, it can be, you know, you can quickly develop
link |
00:52:52.320
a mechanism that solves the capture.
link |
00:52:55.360
They've probably gotten a lot harder too. They've been getting clever and clever
link |
00:53:00.320
generating these text captures. So, okay. So that was one of the things you've tested it on is these
link |
00:53:06.640
kinds of captures in 2014, 15, that kind of stuff. So what, I mean, why, by the way, why captures?
link |
00:53:15.120
Yeah. Even now, I would say capture is a very, very good challenge problem. If you want to
link |
00:53:21.920
understand how human perception works, and if you want to build systems that work,
link |
00:53:27.040
like the human brain, and I wouldn't say capture is a solved problem. We have cracked the fundamental
link |
00:53:32.880
defense of captures, but it is not solved in the way that humans solve it. So I can give an example.
link |
00:53:40.000
I can take a five year old child who has just learned characters and show them any new capture
link |
00:53:48.640
that we create. They will be able to solve it. I can show you, I can show you a picture of a
link |
00:53:56.400
character. I can show you pretty much any new capture from any new website. You'll be able to
link |
00:54:02.000
solve it without getting any training examples from that particular style of capture.
link |
00:54:06.640
You're assuming I'm human. Yeah.
link |
00:54:08.000
Yes. Yeah. That's right. So if you are human, otherwise I will be able to figure that out
link |
00:54:15.440
using this one. But this whole podcast is just a touring test, a long touring test. Anyway,
link |
00:54:22.000
yeah. So humans can figure it out with very few examples. Or no training examples. No training
link |
00:54:28.880
examples from that particular style of capture. So even now this is unreachable for the current
link |
00:54:37.760
deep learning system. So basically there is no, I don't think a system exists where you can
link |
00:54:41.760
basically say, train on whatever you want. And then now say, hey, I will show you a new capture,
link |
00:54:47.840
which I did not show you in the training setup. Will the system be able to solve it? It still
link |
00:54:54.160
doesn't exist. So that is the magic of human perception. And Doug Hofstadter put this very
link |
00:55:01.760
beautifully in one of his talks. The central problem in AI is what is the letter A. If you
link |
00:55:11.440
can build a system that reliably can detect all the variations of the letter A, you don't even
link |
00:55:17.600
know to go to the B and the C. Yeah. You don't even know to go to the B and the C or the strings
link |
00:55:23.040
of characters. And so that is the spirit with which we tackle that problem.
link |
00:55:28.880
What does it mean by that? I mean, is it like without training examples, try to figure out
link |
00:55:36.160
the fundamental elements that make up the letter A in all of its forms?
link |
00:55:43.520
In all of its forms. A can be made with two humans standing, leaning against each other,
link |
00:55:47.920
holding the hands. And it can be made of leaves.
link |
00:55:52.080
Yeah. You might have to understand everything about this world in order to understand the
link |
00:55:56.480
letter A. Yeah. Exactly.
link |
00:55:57.920
So it's common sense reasoning, essentially. Yeah.
link |
00:56:00.400
Right. So to finally, to really solve, finally to say that you have solved capture,
link |
00:56:07.760
you have to solve the whole problem.
link |
00:56:08.880
Yeah. Okay. So how does this kind of the RCN architecture help us to do a better job of that
link |
00:56:18.560
kind of thing? Yeah. So as I mentioned, one of the important things was being able to do inference,
link |
00:56:24.960
being able to dynamically do inference.
link |
00:56:28.640
Can you clarify what you mean? Because you said like neural networks don't do inference.
link |
00:56:33.040
Yeah. So what do you mean by inference in this context then?
link |
00:56:35.840
So, okay. So in captures, what they do to confuse people is to make these characters crowd together.
link |
00:56:43.360
Yes. Okay. And when you make the characters crowd together, what happens is that you will now start
link |
00:56:48.400
seeing combinations of characters as some other new character or an existing character. So you
link |
00:56:53.920
would put an R and N together. It will start looking like an M. And so locally, there is
link |
00:57:02.320
very strong evidence for it being some incorrect character. But globally, the only explanation that
link |
00:57:11.520
fits together is something that is different from what you can find locally. Yes. So this is
link |
00:57:18.240
inference. You are basically taking local evidence and putting it in the global context and often
link |
00:57:25.840
coming to a conclusion locally, which is conflicting with the local information.
link |
00:57:29.920
So actually, so you mean inference like in the way it's used when you talk about reasoning,
link |
00:57:36.560
for example, as opposed to like inference, which is with artificial neural networks,
link |
00:57:42.240
which is a single pass to the network. Okay. So like you're basically doing some basic forms of
link |
00:57:47.840
reasoning, like integration of like how local things fit into the global picture.
link |
00:57:54.480
And things like explaining a way coming into this one, because you are explaining that piece
link |
00:57:59.840
of evidence as something else, because globally, that's the only thing that makes sense. So now
link |
00:58:08.160
you can amortize this inference in a neural network. If you want to do this, you can brute
link |
00:58:15.600
force it. You can just show it all combinations of things that you want your reasoning to work over.
link |
00:58:23.120
And you can just train the help out of that neural network and it will look like it is doing inference
link |
00:58:30.880
on the fly, but it is really just doing amortized inference. It is because you have shown it a lot
link |
00:58:37.680
of these combinations during training time. So what you want to do is be able to do dynamic
link |
00:58:43.840
inference rather than just being able to show all those combinations in the training time.
link |
00:58:48.480
And that's something we emphasized in the model. What does it mean, dynamic inference? Is that
link |
00:58:54.080
that has to do with the feedback thing? Yes. Like what is dynamic? I'm trying to visualize what
link |
00:59:00.320
dynamic inference would be in this case. Like what is it doing with the input? It's shown the input
link |
00:59:05.920
the first time. Yeah. And is like what's changing over temporally? What's the dynamics of this
link |
00:59:13.840
inference process? So you can think of it as you have at the top of the model, the characters that
link |
00:59:19.840
you are trained on. They are the causes that you are trying to explain the pixels using the
link |
00:59:26.720
characters as the causes. The characters are the things that cause the pixels. Yeah. So there's
link |
00:59:33.600
this causality thing. So the reason you mentioned causality, I guess, is because there's a temporal
link |
00:59:38.960
aspect to this whole thing. In this particular case, the temporal aspect is not important.
link |
00:59:43.280
It is more like when if I turn the character on, the pixels will turn on. Yeah, it will be after
link |
00:59:50.000
this a little bit. Okay. So that is causality in the sense of like a logic causality, like
link |
00:59:55.520
hence inference. Okay. The dynamics is that even though locally it will look like, okay, this is an
link |
01:00:03.200
A. And locally, just when I look at just that patch of the image, it looks like an A. But when I look
link |
01:00:11.280
at it in the context of all the other causes, A is not something that makes sense. So that is
link |
01:00:17.600
something you have to kind of recursively figure out. Yeah. So, okay. And this thing performed
link |
01:00:24.720
pretty well on the CAPTCHAs. Correct. And I mean, is there some kind of interesting intuition you
link |
01:00:32.080
can provide why it did well? Like what did it look like? Is there visualizations that could be human
link |
01:00:37.840
interpretable to us humans? Yes. Yeah. So the good thing about the model is that it is extremely,
link |
01:00:44.320
so it is not just doing a classification, right? It is providing a full explanation for the scene.
link |
01:00:50.400
So when it operates on a scene, it is coming back and saying, look, this is the part is the A,
link |
01:00:59.600
and these are the pixels that turned on. These are the pixels in the input that makes me think that
link |
01:01:06.880
it is an A. And also, these are the portions I hallucinated. It provides a complete explanation
link |
01:01:14.640
of that form. And then these are the contours. This is the interior. And this is in front of
link |
01:01:21.360
this other object. So that's the kind of explanation the inference network provides.
link |
01:01:28.400
So that is useful and interpretable. And then the kind of errors it makes are also,
link |
01:01:40.000
I don't want to read too much into it, but the kind of errors the network makes are very similar
link |
01:01:47.040
to the kinds of errors humans would make in a similar situation. So there's something about
link |
01:01:51.120
the structure that feels reminiscent of the way humans visual system works. Well, I mean,
link |
01:02:00.240
how hardcoded is this to the capture problem, this idea?
link |
01:02:04.320
Not really hardcoded because the assumptions, as I mentioned, are general, right? It is more,
link |
01:02:11.280
and those themselves can be applied in many situations which are natural signals. So it's
link |
01:02:17.680
the foreground versus background factorization and the factorization of the surfaces versus
link |
01:02:24.320
the contours. So these are all generally applicable assumptions.
link |
01:02:27.600
In all vision. So why attack the capture problem, which is quite unique in the computer vision
link |
01:02:36.000
context versus like the traditional benchmarks of ImageNet and all those kinds of image
link |
01:02:42.800
classification or even segmentation tasks and all of that kind of stuff. What's your thinking about
link |
01:02:49.120
those kinds of benchmarks in this context? I mean, those benchmarks are useful for deep
link |
01:02:55.760
learning kind of algorithms. So the settings that deep learning works in are here is my huge
link |
01:03:03.600
training set and here is my test set. So the training set is almost 100x, 1000x bigger than
link |
01:03:10.480
the test set in many, many cases. What we wanted to do was invert that. The training set is way
link |
01:03:18.480
smaller than the test set. And capture is a problem that is by definition hard for computers
link |
01:03:30.080
and it has these good properties of strong generalization, strong out of training distribution
link |
01:03:36.640
generalization. If you are interested in studying that and having your model have that property,
link |
01:03:44.480
then it's a good data set to tackle. So have you attempted to, which I think,
link |
01:03:49.840
I believe there's quite a growing body of work on looking at MNIST and ImageNet without training.
link |
01:03:58.080
So it's like taking the basic challenge is what tiny fraction of the training set can we take in
link |
01:04:05.760
order to do a reasonable job of the classification task? Have you explored that angle in these
link |
01:04:13.680
classic benchmarks? Yes. So we did do MNIST. So it's not just capture. So there was also
link |
01:04:23.440
multiple versions of MNIST, including the standard version where we inverted the problem,
link |
01:04:28.720
which is basically saying rather than train on 60,000 training data, how quickly can you get
link |
01:04:37.200
to high level accuracy with very little training data? Is there some performance you remember,
link |
01:04:42.080
like how well did it do? How many examples did it need? Yeah. I remember that it was
link |
01:04:50.400
on the order of tens or hundreds of examples to get into 95% accuracy. And it was definitely
link |
01:05:00.880
better than the other systems out there at that time.
link |
01:05:03.840
At that time. Yeah. They're really pushing. I think that's a really interesting space,
link |
01:05:07.920
actually. I think there's an actual name for MNIST. There's different names to the different
link |
01:05:17.360
sizes of training sets. I mean, people are like attacking this problem. I think it's
link |
01:05:21.600
super interesting. It's funny how like the MNIST will probably be with us all the way to AGI.
link |
01:05:29.760
It's a data set that just sticks by. It's a clean, simple data set to study the fundamentals of
link |
01:05:37.680
learning with just like captures. It's interesting. Not enough people. I don't know. Maybe you can
link |
01:05:43.280
correct me, but I feel like captures don't show up as often in papers as they probably should.
link |
01:05:48.240
That's correct. Yeah. Because usually these things have a momentum. Once something gets
link |
01:05:56.640
established as a standard benchmark, there is a dynamics of how graduate students operate and how
link |
01:06:06.000
academic system works that pushes people to track that benchmark.
link |
01:06:10.640
Yeah. Nobody wants to think outside the box. Okay. Okay. So good performance on the captures.
link |
01:06:20.480
What else is there interesting on the RCN side before we talk about the cortical micros?
link |
01:06:25.520
Yeah. So the same model. So the important part of the model was that it trains very
link |
01:06:31.760
quickly with very little training data and it's quite robust to out of distribution
link |
01:06:37.440
perturbations. And we are using that very fruitfully at Vicarious in many of the
link |
01:06:45.760
robotics tasks we are solving. Well, let me ask you this kind of touchy question. I have to,
link |
01:06:51.840
I've spoken with your friend, colleague, Jeff Hawkins, too. I have to kind of ask,
link |
01:06:59.520
there is a bit of, whenever you have brain inspired stuff and you make big claims,
link |
01:07:05.680
big sexy claims, there's critics, I mean, machine learning subreddit, don't get me started on those
link |
01:07:14.720
people. Criticism is good, but they're a bit over the top. There is quite a bit of sort of
link |
01:07:23.680
skepticism and criticism. Is this work really as good as it promises to be? Do you have thoughts
link |
01:07:31.040
on that kind of skepticism? Do you have comments on the kind of criticism I might have received
link |
01:07:36.800
about, you know, is this approach legit? Is this a promising approach? Or at least as promising as
link |
01:07:44.880
it seems to be, you know, advertised as? Yeah, I can comment on it. So, you know, our RCN paper
link |
01:07:52.480
is published in Science, which I would argue is a very high quality journal, very hard to publish
link |
01:07:58.560
in. And, you know, usually it is indicative of the quality of the work. And I am very,
link |
01:08:08.160
very certain that the ideas that we brought together in that paper, in terms of the importance
link |
01:08:13.760
of feedback connections, recursive inference, lateral connections, coming to best explanation
link |
01:08:20.160
of the scene as the problem to solve, trying to solve recognition, segmentation, all jointly,
link |
01:08:27.360
in a way that is compatible with higher level cognition, top down attention, all those ideas
link |
01:08:31.920
that we brought together into something, you know, coherent and workable in the world and
link |
01:08:36.000
solving a challenging, tackling a challenging problem. I think that will stay and that
link |
01:08:40.880
contribution I stand by. Now, I can tell you a story which is funny in the context of this. So,
link |
01:08:49.360
if you read the abstract of the paper and, you know, the argument we are putting in, you know,
link |
01:08:53.360
we are putting in, look, current deep learning systems take a lot of training data. They don't
link |
01:08:59.120
use these insights. And here is our new model, which is not a deep neural network. It's a
link |
01:09:03.760
graphical model. It does inference. This is how the paper is, right? Now, once the paper was
link |
01:09:08.560
accepted and everything, it went to the press department in Science, you know, AAAS Science
link |
01:09:14.800
Office. We didn't do any press release when it was published. It went to the press department.
link |
01:09:18.880
What was the press release that they wrote up? A new deep learning model.
link |
01:09:24.880
Solves CAPTCHAs.
link |
01:09:25.920
Solves CAPTCHAs. And so, you can see where was, you know, what was being hyped in that thing,
link |
01:09:32.400
right? So, there is a dynamic in the community of, you know, so that especially happens when
link |
01:09:42.160
there are lots of new people coming into the field and they get attracted to one thing.
link |
01:09:46.720
And some people are trying to think different compared to that. So, there is some, I think
link |
01:09:52.560
skepticism is science is important and it is, you know, very much required. But it's also,
link |
01:09:59.360
it's not skepticism. Usually, it's mostly bandwagon effect that is happening rather than.
link |
01:10:05.200
Well, but that's not even that. I mean, I'll tell you what they react to, which is like,
link |
01:10:09.760
I'm sensitive to as well. If you look at just companies, OpenAI, DeepMind, Vicarious, I mean,
link |
01:10:16.960
they just, there's a little bit of a race to the top and hype, right? It's like, it doesn't pay off
link |
01:10:27.520
to be humble. So, like, and the press is just irresponsible often. They just, I mean, don't
link |
01:10:37.600
get me started on the state of journalism today. Like, it seems like the people who write articles
link |
01:10:42.880
about these things, they literally have not even spent an hour on the Wikipedia article about what
link |
01:10:49.280
is neural networks. Like, they haven't like invested just even the language to laziness.
link |
01:10:56.160
It's like, robots beat humans. Like, they write this kind of stuff that just, and then of course,
link |
01:11:06.800
the researchers are quite sensitive to that because it gets a lot of attention. They're like,
link |
01:11:11.760
why did this word get so much attention? That's over the top and people get really sensitive.
link |
01:11:18.240
The same kind of criticism with OpenAI did work with Rubik's cube with the robot that people
link |
01:11:24.080
criticized. Same with GPT2 and 3, they criticize. Same thing with DeepMinds with AlphaZero. I mean,
link |
01:11:33.120
yeah, I'm sensitive to it. But, and of course, with your work, you mentioned deep learning, but
link |
01:11:39.280
there's something super sexy to the public about brain inspired. I mean, that immediately grabs
link |
01:11:45.520
people's imagination, not even like neural networks, but like really brain inspired, like
link |
01:11:53.600
brain like neural networks. That seems really compelling to people and to me as well, to the
link |
01:12:00.480
world as a narrative. And so people hook up, hook onto that. And sometimes the skepticism engine
link |
01:12:10.400
turns on in the research community and they're skeptical. But I think putting aside the ideas
link |
01:12:17.600
of the actual performance and captures or performance in any data set. I mean, to me,
link |
01:12:22.480
all these data sets are useless anyway. It's nice to have them. But in the grand scheme of things,
link |
01:12:28.720
they're silly toy examples. The point is, is there intuition about the ideas, just like you
link |
01:12:36.080
mentioned, bringing the ideas together in a unique way? Is there something there? Is there some value
link |
01:12:42.400
there? And is it going to stand the test of time? And that's the hope. That's the hope.
link |
01:12:46.400
Yes. My confidence there is very high. I don't treat brain inspired as a marketing term.
link |
01:12:53.440
I am looking into the details of biology and puzzling over those things and I am grappling
link |
01:13:01.920
with those things. And so it is not a marketing term at all. You can use it as a marketing term
link |
01:13:07.600
and people often use it and you can get combined with them. And when people don't understand
link |
01:13:13.680
how you're approaching the problem, it is easy to be misunderstood and think of it as purely
link |
01:13:20.480
marketing. But that's not the way we are. So you really, I mean, as a scientist,
link |
01:13:27.120
you believe that if we kind of just stick to really understanding the brain, that's going to,
link |
01:13:33.760
that's the right, like you should constantly meditate on the, how does the brain do this?
link |
01:13:39.440
Because that's going to be really helpful for engineering and technology systems.
link |
01:13:43.520
Yes. You need to, so I think it's one input and it is helpful, but you should know when to deviate
link |
01:13:51.680
from it too. So an example is convolutional neural networks, right? Convolution is not an
link |
01:13:59.120
operation brain implements. The visual cortex is not convolutional. Visual cortex has local
link |
01:14:06.240
receptive fields, local connectivity, but there is no translation invariance in the network weights
link |
01:14:18.640
in the visual cortex. That is a computational trick, which is a very good engineering trick
link |
01:14:24.080
that we use for sharing the training between the different nodes. And that trick will be with us
link |
01:14:31.840
for some time. It will go away when we have robots with eyes and heads that move. And so then that
link |
01:14:41.600
trick will go away. It will not be useful at that time. So the brain doesn't have translational
link |
01:14:49.040
invariance. It has the focal point, like it has a thing it focuses on. Correct. It has a phobia.
link |
01:14:54.720
And because of the phobia, the receptive fields are not like the copying of the weights. Like the
link |
01:15:01.920
weights in the center are very different from the weights in the periphery. Yes. At the periphery.
link |
01:15:05.760
I mean, I did this, actually wrote a paper and just gotten a chance to really study peripheral
link |
01:15:12.720
vision, which is a fascinating thing. Very under understood thing of what the brain, you know,
link |
01:15:21.600
at every level the brain does with the periphery. It does some funky stuff. Yeah. So it's another
link |
01:15:28.240
kind of trick than convolutional. Like it does, it's, you know, convolution in neural networks is
link |
01:15:39.040
a trick for efficiency, is efficiency trick. And the brain does a whole nother kind of thing.
link |
01:15:44.160
Correct. So you need to understand the principles or processing so that you can still apply
link |
01:15:51.280
engineering tricks where you want it to. You don't want to be slavishly mimicking all the things of
link |
01:15:55.840
the brain. And so, yeah, so it should be one input. And I think it is extremely helpful,
link |
01:16:02.000
but it should be the point of really understanding so that you know when to deviate from it.
link |
01:16:06.720
So, okay. That's really cool. That's work from a few years ago. You did work in Umenta with Jeff
link |
01:16:14.560
Hawkins with hierarchical temporal memory. How is your just, if you could give a brief history,
link |
01:16:23.040
how is your view of the way the models of the brain changed over the past few years leading up
link |
01:16:30.240
to now? Is there some interesting aspects where there was an adjustment to your understanding of
link |
01:16:36.960
the brain or is it all just building on top of each other? In terms of the higher level ideas,
link |
01:16:42.720
especially the ones Jeff wrote about in the book, if you blur out, right. Yeah. On intelligence.
link |
01:16:47.920
Right. On intelligence. If you blur out the details and if you just zoom out and at the
link |
01:16:52.560
higher level idea, things are, I would say, consistent with what he wrote about. But many
link |
01:17:02.320
things will be consistent with that because it's a blur. Deep learning systems are also
link |
01:17:08.160
multi level, hierarchical, all of those things. But in terms of the detail, a lot of things are
link |
01:17:16.960
different. And those details matter a lot. So one point of difference I had with Jeff was how to
link |
01:17:28.000
approach, how much of biological plausibility and realism do you want in the learning algorithms?
link |
01:17:36.080
So when I was there, this was almost 10 years ago now.
link |
01:17:41.520
It flies when you're having fun.
link |
01:17:43.760
Yeah. I don't know what Jeff thinks now, but 10 years ago, the difference was that
link |
01:17:49.760
I did not want to be so constrained on saying my learning algorithms need to be
link |
01:17:56.880
biologically plausible based on some filter of biological plausibility available at that time.
link |
01:18:03.200
To me, that is a dangerous cut to make because we are discovering more and more things about
link |
01:18:09.200
the brain all the time. New biophysical mechanisms, new channels are being discovered
link |
01:18:14.560
all the time. So I don't want to upfront kill off a learning algorithm just because we don't
link |
01:18:21.360
really understand the full biophysics or whatever of how the brain learns.
link |
01:18:27.680
Exactly. Exactly.
link |
01:18:29.120
Let me ask and I'm sorry to interrupt. What's your sense? What's our best understanding of
link |
01:18:34.720
how the brain learns?
link |
01:18:36.000
So things like backpropagation, credit assignment. So many of these algorithms have,
link |
01:18:42.720
learning algorithms have things in common, right? It is a backpropagation is one way of
link |
01:18:47.600
credit assignment. There is another algorithm called expectation maximization, which is,
link |
01:18:52.560
you know, another weight adjustment algorithm.
link |
01:18:55.520
But is it your sense the brain does something like this?
link |
01:18:58.320
Has to. There is no way around it in the sense of saying that you do have to adjust the
link |
01:19:04.960
connections.
link |
01:19:06.240
So yeah, and you're saying credit assignment, you have to reward the connections that were
link |
01:19:09.600
useful in making a correct prediction and not, yeah, I guess what else, but yeah, it
link |
01:19:14.320
doesn't have to be differentiable.
link |
01:19:16.800
Yeah, it doesn't have to be differentiable. Yeah. But you have to have a, you know, you
link |
01:19:22.320
have a model that you start with, you have data comes in and you have to have a way of
link |
01:19:27.760
adjusting the model such that it better fits the data. So that is all of learning, right?
link |
01:19:33.920
And some of them can be using backprop to do that. Some of it can be using, you know,
link |
01:19:40.400
very local graph changes to do that.
link |
01:19:45.360
That can be, you know, many of these learning algorithms have similar update properties
link |
01:19:52.160
locally in terms of what the neurons need to do locally.
link |
01:19:57.200
I wonder if small differences in learning algorithms can have huge differences in the
link |
01:20:01.120
actual effect. So the dynamics of, I mean, sort of the reverse like spiking, like if
link |
01:20:09.920
credit assignment is like a lightning versus like a rainstorm or something, like whether
link |
01:20:18.480
there's like a looping local type of situation with the credit assignment, whether there is
link |
01:20:26.240
like regularization, like how it injects robustness into the whole thing, like whether
link |
01:20:34.720
it's chemical or electrical or mechanical. Yeah. All those kinds of things. I feel like
link |
01:20:42.080
it, that, yeah, I feel like those differences could be essential, right? It could be. It's
link |
01:20:48.800
just that you don't know enough to, on the learning side, you don't know, you don't know
link |
01:20:54.880
enough to say that is definitely not the way the brain does it. Got it. So you don't want
link |
01:20:59.840
to be stuck to it. So that, yeah. So you've been open minded on that side of things.
link |
01:21:04.800
On the inference side, on the recognition side, I am much more, I'm able to be constrained
link |
01:21:09.920
because it's much easier to do experiments because, you know, it's like, okay, here's
link |
01:21:13.600
the stimulus, you know, how many steps did it get to take the answer? I can trace it
link |
01:21:18.000
back. I can, I can understand the speed of that computation, et cetera. I'm able to do
link |
01:21:23.120
of that computation, et cetera, much more readily on the inference side. Got it. And
link |
01:21:28.400
then you can't do good experiments on the learning side. Correct. So let's go right
link |
01:21:34.880
into the cortical microcircuits right back. So what are these ideas beyond recursive cortical
link |
01:21:42.080
network that you're looking at now? So we have made a, you know, pass through multiple
link |
01:21:48.960
of the steps that, you know, as I mentioned earlier, you know, we were looking at perception
link |
01:21:54.480
from the angle of cognition, right? It was not just perception for perception's sake.
link |
01:21:58.720
How do you, how do you connect it to cognition? How do you learn concepts and how do you learn
link |
01:22:04.400
abstract reasoning? Similar to some of the things Francois talked about, right? So we
link |
01:22:13.280
have taken one pass through it basically saying, what is the basic cognitive architecture that
link |
01:22:19.600
you need to have, which has a perceptual system, which has a system that learns dynamics of
link |
01:22:25.120
the world and then has something like a routine program learning system on top of it to learn
link |
01:22:32.240
concepts. So we have built one, you know, the version point one of that system. This
link |
01:22:38.320
was another science robotics paper. It's the title of that paper was, you know, something
link |
01:22:44.640
like cognitive programs. How do you build cognitive programs? And the application there
link |
01:22:49.760
was on manipulation, robotic manipulation? It was, so think of it like this. Suppose
link |
01:22:56.960
you wanted to tell a new person that you met, you don't know the language that person uses.
link |
01:23:04.800
You want to communicate to that person to achieve some task, right? So I want to say,
link |
01:23:10.080
hey, you need to pick up all the red cups from the kitchen counter and put it here, right?
link |
01:23:17.280
How do you communicate that, right? You can show pictures. You can basically say, look,
link |
01:23:21.920
this is the starting state. The things are here. This is the ending state. And what does
link |
01:23:28.080
the person need to understand from that? The person needs to understand what conceptually
link |
01:23:32.400
happened in those pictures from the input to the output, right? So we are looking at
link |
01:23:39.120
preverbal conceptual understanding. Without language, how do you have a set of concepts
link |
01:23:45.360
that you can manipulate in your head? And from a set of images of input and output,
link |
01:23:52.240
can you infer what is happening in those images?
link |
01:23:55.600
Got it. With concepts that are pre language. Okay. So what's it mean for a concept to be pre language?
link |
01:24:02.400
Like why is language so important here?
link |
01:24:10.080
So I want to make a distinction between concepts that are just learned from text
link |
01:24:17.520
by just feeding brute force text. You can start extracting things like, okay,
link |
01:24:23.440
a cow is likely to be on grass. So those kinds of things, you can extract purely from text.
link |
01:24:32.160
But that's kind of a simple association thing rather than a concept as an abstraction of
link |
01:24:37.520
something that happens in the real world in a grounded way that I can simulate it in my
link |
01:24:44.480
mind and connect it back to the real world. And you think kind of the visual world,
link |
01:24:51.200
concepts in the visual world are somehow lower level than just the language?
link |
01:24:58.800
The lower level kind of makes it feel like, okay, that's unimportant. It's more like,
link |
01:25:04.720
I would say the concepts in the visual and the motor system and the concept learning system,
link |
01:25:15.440
which if you cut off the language part, just what we learn by interacting with the world
link |
01:25:20.320
and abstractions from that, that is a prerequisite for any real language understanding.
link |
01:25:26.480
So you disagree with Chomsky because he says language is at the bottom of everything.
link |
01:25:32.080
No, I disagree with Chomsky completely on how many levels from universal grammar to...
link |
01:25:39.680
So that was a paper in science beyond the recursive cortical network.
link |
01:25:43.120
What other interesting problems are there, the open problems and brain inspired approaches
link |
01:25:50.480
that you're thinking about?
link |
01:25:51.600
I mean, everything is open, right? No problem is solved, solved. I think of perception as kind of
link |
01:26:02.080
the first thing that you have to build, but the last thing that you will be actually solved.
link |
01:26:07.760
Because if you do not build perception system in the right way, you cannot build concept system in
link |
01:26:12.880
the right way. So you have to build a perception system, however wrong that might be, you have to
link |
01:26:18.560
still build that and learn concepts from there and then keep iterating. And finally, perception
link |
01:26:24.880
will get solved fully when perception, cognition, language, all those things work together finally.
link |
01:26:30.240
So great, we've talked a lot about perception, but then maybe on the concept side and like common
link |
01:26:37.920
sense or just general reasoning side, is there some intuition you can draw from the brain about
link |
01:26:45.280
how we can do that?
link |
01:26:46.880
So I have this classic example I give. So suppose I give you a few sentences and then ask you a
link |
01:26:56.560
question following that sentence. This is a natural language processing problem, right? So here
link |
01:27:01.920
it goes. I'm telling you, Sally pounded a nail on the ceiling. Okay, that's a sentence. Now I'm
link |
01:27:10.400
asking you a question. Was the nail horizontal or vertical?
link |
01:27:14.080
Vertical.
link |
01:27:15.040
Okay, how did you answer that?
link |
01:27:16.400
Well, I imagined Sally, it was kind of hard to imagine what the hell she was doing, but I
link |
01:27:24.960
imagined I had a visual of the whole situation.
link |
01:27:28.320
Exactly, exactly. So here, you know, I post a question in natural language. The answer to
link |
01:27:34.400
that question was you got the answer from actually simulating the scene. Now I can go more and more
link |
01:27:40.720
detailed about, okay, was Sally standing on something while doing this? Could she have been
link |
01:27:47.280
standing on a light bulb to do this? I could ask more and more questions about this and I can ask,
link |
01:27:53.360
make you simulate the scene in more and more detail, right? Where is all that knowledge that
link |
01:27:59.200
you're accessing stored? It is not in your language system. It was not just by reading
link |
01:28:05.600
text, you got that knowledge. It is stored from the everyday experiences that you have had from,
link |
01:28:12.320
and by the age of five, you have pretty much all of this, right? And it is stored in your visual
link |
01:28:18.720
system, motor system in a way such that it can be accessed through language.
link |
01:28:24.480
Got it. I mean, right. So the language is just almost sort of the query into the whole visual
link |
01:28:30.000
cortex and that does the whole feedback thing. But I mean, it is all reasoning kind of connected to
link |
01:28:36.800
the perception system in some way. You can do a lot of it. You know, you can still do a lot of it
link |
01:28:43.920
by quick associations without having to go into the depth. And most of the time you will be right,
link |
01:28:49.760
right? You can just do quick associations, but I can easily create tricky situations for you.
link |
01:28:55.440
Where that quick associations is wrong and you have to actually run the simulation.
link |
01:29:00.080
So figuring out how these concepts connect. Do I have a good idea of how to do that?
link |
01:29:06.800
That's exactly one of the problems that we are working on. And the way we are approaching that
link |
01:29:13.760
is basically saying, okay, you need to, so the takeaway is that language,
link |
01:29:20.400
is simulation control and your perceptual plus a motor system is building a simulation of the world.
link |
01:29:28.960
And so that's basically the way we are approaching it. And the first thing that we built was a
link |
01:29:34.720
controllable perceptual system. And we built a schema networks, which was a controllable dynamic
link |
01:29:40.160
system. Then we built a concept learning system that puts all these things together
link |
01:29:44.960
into programs or subtractions that you can run and simulate. And now we are taking the step
link |
01:29:51.600
of connecting it to language. And it will be very simple examples. Initially, it will not be
link |
01:29:57.760
the GPT3 like examples, but it will be grounded simulation based language.
link |
01:30:02.640
And for like the querying would be like question answering kind of thing?
link |
01:30:08.400
Correct. Correct. And so that's what we're trying to do. We're trying to build a system
link |
01:30:13.600
kind of thing. Correct. Correct. And it will be in some simple world initially on, you know,
link |
01:30:19.120
but it will be about, okay, can the system connect the language and ground it in the right way and
link |
01:30:25.280
run the right simulations to come up with the answer. And the goal is to try to do things that,
link |
01:30:29.600
for example, GPT3 couldn't do. Correct. Speaking of which, if we could talk about GPT3 a little
link |
01:30:38.720
bit, I think it's an interesting thought provoking set of ideas that OpenAI is pushing forward. I
link |
01:30:46.080
think it's good for us to talk about the limits and the possibilities in the neural network. So
link |
01:30:51.360
in general, what are your thoughts about this recently released very large 175 billion parameter
link |
01:30:58.800
language model? So I haven't directly evaluated it yet. From what I have seen on Twitter and
link |
01:31:05.600
other people evaluating it, it looks very intriguing. I am very intrigued by some of
link |
01:31:09.840
the properties it is displaying. And of course the text generation part of that was already
link |
01:31:17.360
evident in GPT2 that it can generate coherent text over long distances. But of course the
link |
01:31:26.480
weaknesses are also pretty visible in saying that, okay, it is not really carrying a world state
link |
01:31:32.000
around. And sometimes you get sentences like, I went up the hill to reach the valley or the thing
link |
01:31:39.200
like some completely incompatible statements, or when you're traveling from one place to the other,
link |
01:31:46.080
it doesn't take into account the time of travel, things like that. So those things I think will
link |
01:31:50.800
happen less in GPT3 because it is trained on even more data and it can do even more longer distance
link |
01:31:59.040
coherence. But it will still have the fundamental limitations that it doesn't have a world model
link |
01:32:07.600
and it can't run simulations in its head to find whether something is true in the world or not.
link |
01:32:13.280
So it's taking a huge amount of text from the internet and forming a compressed representation.
link |
01:32:20.400
Do you think in that could emerge something that's an approximation of a world model,
link |
01:32:27.600
which essentially could be used for reasoning? I'm not talking about GPT3, I'm talking about GPT4,
link |
01:32:35.920
5 and GPT10. Yeah, I mean they will look more impressive than GPT3. So if you take that to
link |
01:32:42.320
the extreme then a Markov chain of just first order and if you go to, I'm taking the other
link |
01:32:51.520
extreme, if you read Shannon's book, he has a model of English text which is based on first
link |
01:32:59.200
order Markov chains, second order Markov chains, third order Markov chains and saying that okay,
link |
01:33:03.120
third order Markov chains look better than first order Markov chains. So does that mean a first
link |
01:33:09.600
order Markov chain has a model of the world? Yes, it does. So yes, in that level when you go higher
link |
01:33:18.160
order models or more sophisticated structure in the model like the transformer networks have,
link |
01:33:24.160
yes they have a model of the text world, but that is not a model of the world. It's a model
link |
01:33:32.640
of the text world and it will have interesting properties and it will be useful, but just scaling
link |
01:33:41.120
it up is not going to give us AGI or natural language understanding or meaning. Well the
link |
01:33:49.280
question is whether being forced to compress a very large amount of text forces you to construct
link |
01:33:58.880
things that are very much like, because the ideas of concepts and meaning is a spectrum.
link |
01:34:06.800
Sure, yeah. So in order to form that kind of compression,
link |
01:34:13.920
maybe it will be forced to figure out abstractions which look awfully a lot like the kind of things
link |
01:34:24.160
that we think about as concepts, as world models, as common sense. Is that possible?
link |
01:34:31.120
No, I don't think it is possible because the information is not there.
link |
01:34:34.320
The information is there behind the text, right?
link |
01:34:38.640
No, unless somebody has written down all the details about how everything works in the world
link |
01:34:44.400
to the absurd amounts like, okay, it is easier to walk forward than backward, that you have to open
link |
01:34:51.040
the door to go out of the thing, doctors wear underwear. Unless all these things somebody
link |
01:34:56.560
has written down somewhere or somehow the program found it to be useful for compression from some
link |
01:35:01.680
other text, the information is not there. So that's an argument that text is a lot
link |
01:35:07.840
lower fidelity than the experience of our physical world.
link |
01:35:13.040
Right, correct. Pictures worth a thousand words.
link |
01:35:17.440
Well, in this case, pictures aren't really... So the richest aspect of the physical world isn't
link |
01:35:24.080
even just pictures, it's the interactivity with the world.
link |
01:35:28.240
Exactly, yeah.
link |
01:35:29.200
It's being able to interact. It's almost like...
link |
01:35:36.720
It's almost like if you could interact... Well, maybe I agree with you that pictures
link |
01:35:42.880
worth a thousand words, but a thousand...
link |
01:35:45.760
It's still... Yeah, you could capture it with the GPTX.
link |
01:35:49.760
So I wonder if there's some interactive element where a system could live in text world where it
link |
01:35:54.400
could be part of the chat, be part of talking to people. It's interesting. I mean, fundamentally...
link |
01:36:03.040
So you're making a statement about the limitation of text. Okay, so let's say we have a text
link |
01:36:10.960
corpus that includes basically every experience we could possibly have. I mean, just a very large
link |
01:36:19.280
corpus of text and also interactive components. I guess the question is whether the neural network
link |
01:36:25.440
architecture, these very simple transformers, but if they had like hundreds of trillions or
link |
01:36:33.200
whatever comes after a trillion parameters, whether that could store the information
link |
01:36:42.080
needed, that's architecturally. Do you have thoughts about the limitation on that side of
link |
01:36:46.880
things with neural networks? I mean, so transformers are still a feed forward neural
link |
01:36:52.160
network. It has a very interesting architecture, which is good for text modeling and probably some
link |
01:36:59.200
aspects of video modeling, but it is still a feed forward architecture. You believe in the
link |
01:37:04.560
feedback mechanism, the recursion. Oh, and also causality, being able to do counterfactual
link |
01:37:11.280
reasoning, being able to do interventions, which is actions in the world. So all those things
link |
01:37:20.080
require different kinds of models to be built. I don't think transformers captures that family. It
link |
01:37:28.400
is very good at statistical modeling of text and it will become better and better with more data,
link |
01:37:35.280
bigger models, but that is only going to get so far. So I had this joke on Twitter saying that,
link |
01:37:44.240
hey, this is a model that has read all of quantum mechanics and theory of relativity and we are
link |
01:37:51.600
asking you to do text completion or we are asking you to solve simple puzzles. When you have AGI,
link |
01:37:59.280
that is not what you ask the system to do. We will ask the system to do experiments and come
link |
01:38:08.240
up with hypothesis and revise the hypothesis based on evidence from experiments, all those things.
link |
01:38:13.680
Those are the things that we want the system to do when we have AGI, not solve simple puzzles.
link |
01:38:20.000
Like impressive demos, somebody generating a red button in HTML.
link |
01:38:24.080
Right, which are all useful. There is no dissing the usefulness of it.
link |
01:38:29.920
So by the way, I am playing a little bit of a devil's advocate, so calm down internet.
link |
01:38:37.280
So I am curious almost in which ways will a dumb but large neural network will surprise us.
link |
01:38:47.040
I completely agree with your intuition. It is just that I do not want to dogmatically
link |
01:38:58.400
100% put all the chips there. We have been surprised so much. Even the current GPT2 and
link |
01:39:06.160
GPT3 are so surprising. The self play mechanisms of AlphaZero are really surprising. The fact that
link |
01:39:18.640
reinforcement learning works at all to me is really surprising. The fact that neural networks work at
link |
01:39:23.440
all is quite surprising given how nonlinear the space is, the fact that it is able to find local
link |
01:39:30.320
minima that are at all reasonable. It is very surprising. I wonder sometimes whether us humans
link |
01:39:39.760
just want for AGI not to be such a dumb thing. Because exactly what you are saying is like
link |
01:39:52.560
the ideas of concepts and be able to reason with those concepts and connect those concepts in
link |
01:39:57.600
hierarchical ways and then to be able to have world models. Just everything we are describing
link |
01:40:05.360
in human language in this poetic way seems to make sense. That is what intelligence and reasoning
link |
01:40:11.120
are like. I wonder if at the core of it, it could be much dumber. Well, finally it is still
link |
01:40:17.680
connections and messages passing over. So in that way it is dumb. So I guess the recursion,
link |
01:40:24.880
the feedback mechanism, that does seem to be a fundamental kind of thing.
link |
01:40:32.560
The idea of concepts. Also memory. Correct. Having an episodic memory. That seems to be
link |
01:40:39.920
an important thing. So how do we get memory? So we have another piece of work which came
link |
01:40:45.760
out recently on how do you form episodic memories and form abstractions from them.
link |
01:40:52.080
And we haven't figured out all the connections of that to the overall cognitive architecture.
link |
01:40:57.680
But what are your ideas about how you could have episodic memory? So at least it is very clear
link |
01:41:04.720
that you need to have two kinds of memory. That is very, very clear. There are things that happen
link |
01:41:13.600
as statistical patterns in the world, but then there is the one timeline of things that happen
link |
01:41:19.760
only once in your life. And this day is not going to happen ever again. And that needs to be stored
link |
01:41:27.360
as just a stream of strings. This is my experience. And then the question is about
link |
01:41:36.000
how do you take that experience and connect it to the statistical part of it? How do you
link |
01:41:40.880
now say that, okay, I experienced this thing. Now I want to be careful about similar situations.
link |
01:41:47.040
So you need to be able to index that similarity using your other giants that is the model of the
link |
01:41:57.920
world that you have learned. Although the situation came from the episode, you need to be able to
link |
01:42:02.000
index the other one. So the episodic memory being implemented as an indexing over the other model
link |
01:42:13.200
that you're building. So the memories remain and they're indexed into the statistical thing
link |
01:42:24.000
that you form. Yeah, statistical causal structural model that you built over time. So it's basically
link |
01:42:30.560
the idea is that the hippocampus is just storing or sequencing a set of pointers that happens over
link |
01:42:41.360
time. And then whenever you want to reconstitute that memory and evaluate the different aspects of
link |
01:42:48.880
it, whether it was good, bad, do I need to encounter the situation again? You need the cortex
link |
01:42:55.200
to reinstantiate, to replay that memory. So how do you find that memory? Like which
link |
01:43:00.880
direction is the important direction? Both directions are again, bidirectional.
link |
01:43:05.760
I mean, I guess how do you retrieve the memory? So this is again, hypothesis. We're making this
link |
01:43:11.840
up. So when you come to a new situation, your cortex is doing inference over in the new situation.
link |
01:43:21.200
And then of course, hippocampus is connected to different parts of the cortex and you have this
link |
01:43:27.600
deja vu situation, right? Okay, I have seen this thing before. And then in the hippocampus, you can
link |
01:43:35.680
have an index of, okay, this is when it happened as a timeline. And then you can use the hippocampus
link |
01:43:44.480
to drive the similar timelines to say now I am, rather than being driven by my current input
link |
01:43:52.240
stimuli, I am going back in time and rewinding my experience from there, putting back into the
link |
01:43:58.400
cortex. And then putting it back into the cortex of course affects what you're going to see next
link |
01:44:03.680
in your current situation. Got it. Yeah. So that's the whole thing, having a world model and then
link |
01:44:09.280
yeah, connecting to the perception. Yeah, it does seem to be that that's what's happening. On the
link |
01:44:16.320
neural network side, it's interesting to think of how we actually do that. Yeah. To have a knowledge
link |
01:44:24.240
base. Yes. It is possible that you can put many of these structures into neural networks and we will
link |
01:44:31.120
find ways of combining properties of neural networks and graphical models. So, I mean,
link |
01:44:39.440
it's already started happening. Graph neural networks are kind of a merge between them.
link |
01:44:43.840
Yeah. And there will be more of that thing. So, but to me it is, the direction is pretty clear,
link |
01:44:51.440
looking at biology and the history of evolutionary history of intelligence, it is pretty clear that,
link |
01:44:59.600
okay, what is needed is more structure in the models and modeling of the world and supporting
link |
01:45:06.480
dynamic inference. Well, let me ask you, there's a guy named Elon Musk, there's a company called
link |
01:45:13.600
Neuralink and there's a general field called brain computer interfaces. Yeah. It's kind of a
link |
01:45:20.480
interface between your two loves. Yes. The brain and the intelligence. So there's like
link |
01:45:26.560
very direct applications of brain computer interfaces for people with different conditions,
link |
01:45:32.160
more in the short term. Yeah. But there's also these sci fi futuristic kinds of ideas of AI
link |
01:45:38.320
systems being able to communicate in a high bandwidth way with the brain, bidirectional.
link |
01:45:45.600
Yeah. What are your thoughts about Neuralink and BCI in general as a possibility? So I think BCI
link |
01:45:53.840
is a cool research area. And in fact, when I got interested in brains initially, when I was
link |
01:46:02.240
enrolled at Stanford and when I got interested in brains, it was through a brain computer
link |
01:46:07.840
interface talk that Krishna Shenoy gave. That's when I even started thinking about the problem.
link |
01:46:14.160
So it is definitely a fascinating research area and the applications are enormous. So there is a
link |
01:46:21.200
science fiction scenario of brains directly communicating. Let's keep that aside for the
link |
01:46:26.160
time being. Even just the intermediate milestones that pursuing, which are very reasonable as far
link |
01:46:32.400
as I can see, being able to control an external limb using direct connections from the brain
link |
01:46:40.560
and being able to write things into the brain. So those are all good steps to take and they have
link |
01:46:49.120
enormous applications. People losing limbs being able to control prosthetics, quadriplegics being
link |
01:46:55.280
able to control something, and therapeutics. I also know about another company working in
link |
01:47:01.440
the space called Paradromics. They're based on a different electrode array, but trying to attack
link |
01:47:09.120
some of the same problems. So I think it's a very... Also surgery? Correct. Surgically implanted
link |
01:47:14.800
electrodes. Yeah. So yeah, I think of it as a very, very promising field, especially when it is
link |
01:47:22.560
helping people overcome some limitations. Now, at some point, of course, it will advance the level of
link |
01:47:29.040
being able to communicate. How hard is that problem do you think? Let's say we magically solve
link |
01:47:37.440
what I think is a really hard problem of doing all of this safely. Yeah. So being able to connect
link |
01:47:45.600
electrodes and not just thousands, but like millions to the brain. I think it's very,
link |
01:47:51.440
very hard because you also do not know what will happen to the brain with that in the sense of how
link |
01:47:58.160
does the brain adapt to something like that? And as we were learning, the brain is quite,
link |
01:48:04.800
in terms of neuroplasticity, is pretty malleable. Correct. So it's going to adjust. Correct. So the
link |
01:48:10.480
machine learning side, the computer side is going to adjust, and then the brain is going to adjust.
link |
01:48:14.480
Exactly. And then what soup does this land us into? The kind of hallucinations you might get
link |
01:48:20.400
from this that might be pretty intense. Just connecting to all of Wikipedia. It's interesting
link |
01:48:28.080
whether we need to be able to figure out the basic protocol of the brain's communication schemes
link |
01:48:34.960
in order to get them to the machine and the brain to talk. Because another possibility is the brain
link |
01:48:41.120
actually just adjust to whatever the heck the computer is doing. Exactly. That's the way I think
link |
01:48:45.280
that I find that to be a more promising way. It's basically saying, okay, attach electrodes
link |
01:48:51.440
to some part of the cortex. Maybe if it is done from birth, the brain will adapt. It says that
link |
01:48:58.880
that part is not damaged. It was not used for anything. These electrodes are attached there.
link |
01:49:02.880
And now you train that part of the brain to do this high bandwidth communication between
link |
01:49:09.120
something else. And if you do it like that, then it is brain adapting to... And of course,
link |
01:49:15.680
your external system is designed so that it is adaptable. Just like we designed computers
link |
01:49:21.200
or mouse, keyboard, all of them to be interacting with humans. So of course, that feedback system
link |
01:49:28.720
is designed to be human compatible, but now it is not trying to record from all of the brain.
link |
01:49:37.360
And now two systems trying to adapt to each other. It's the brain adapting into one way.
link |
01:49:44.160
That's fascinating. The brain is connected to the internet. Just imagine just connecting it
link |
01:49:51.520
to Twitter and just taking that stream of information. Yeah. But again, if we take a
link |
01:49:59.760
step back, I don't know what your intuition is. I feel like that is not as hard of a problem as the
link |
01:50:08.720
doing it safely. There's a huge barrier to surgery because the biological system, it's a mush of
link |
01:50:19.200
like weird stuff. So that the surgery part of it, biology part of it, the longterm repercussions
link |
01:50:26.800
part of it. I don't know what else will... We often find after a long time in biology that,
link |
01:50:35.440
okay, that idea was wrong. So people used to cut off the gland called the thymus or something.
link |
01:50:43.680
And then they found that, oh no, that actually causes cancer.
link |
01:50:50.560
And then there's a subtle like millions of variables involved. But this whole process,
link |
01:50:55.440
the nice thing, just like again with Elon, just like colonizing Mars, seems like a ridiculously
link |
01:51:02.000
difficult idea. But in the process of doing it, we might learn a lot about the biology of the
link |
01:51:08.320
neurobiology of the brain, the neuroscience side of things. It's like, if you want to learn
link |
01:51:13.520
something, do the most difficult version of it and see what you learn. The intermediate steps
link |
01:51:19.520
that they are taking sounded all very reasonable to me. It's great. Well, but like everything with
link |
01:51:25.680
Elon is the timeline seems insanely fast. So that's the only awful question. Well,
link |
01:51:34.000
we've been talking about cognition a little bit. So like reasoning,
link |
01:51:38.640
we haven't mentioned the other C word, which is consciousness. Do you ever think about that one?
link |
01:51:43.840
Is that useful at all in this whole context of what it takes to create an intelligent reasoning
link |
01:51:51.520
being? Or is that completely outside of your, like the engineering perspective of intelligence?
link |
01:51:58.400
It is not outside the realm, but it doesn't on a day to day basis inform what we do,
link |
01:52:05.120
but it's more, so in many ways, the company name is connected to this idea of consciousness.
link |
01:52:12.160
What's the company name? Vicarious. So Vicarious is the company name. And so what does Vicarious
link |
01:52:19.600
mean? At the first level, it is about modeling the world and it is internalizing the external actions.
link |
01:52:29.360
So you interact with the world and learn a lot about the world. And now after having learned
link |
01:52:34.960
a lot about the world, you can run those things in your mind without actually having to act
link |
01:52:42.080
in the world. So you can run things vicariously just in your brain. And similarly, you can
link |
01:52:48.800
experience another person's thoughts by having a model of how that person works
link |
01:52:54.560
and running there, putting yourself in some other person's shoes. So that is being vicarious.
link |
01:53:01.280
Now it's the same modeling apparatus that you're using to model the external world
link |
01:53:06.800
or some other person's thoughts. You can turn it to yourself. If that same modeling thing is
link |
01:53:14.320
applied to your own modeling apparatus, then that is what gives rise to consciousness, I think.
link |
01:53:21.040
Well, that's more like self awareness. There's the hard problem of consciousness, which is
link |
01:53:25.840
when the model feels like something, when this whole process is like you really are in it.
link |
01:53:37.680
You feel like an entity in this world. Not just you know that you're an entity, but it feels like
link |
01:53:43.920
something to be that entity. And thereby, we attribute this. Then it starts to be where
link |
01:53:54.400
something that has consciousness can suffer. You start to have these kinds of things that we can
link |
01:53:59.120
reason about that is much heavier. It seems like there's much greater cost to your decisions.
link |
01:54:09.520
And mortality is tied up into that. The fact that these things end. First of all, I end at some
link |
01:54:18.640
point, and then other things end. That somehow seems to be, at least for us humans, a deep
link |
01:54:27.840
motivator. That idea of motivation in general, we talk about goals in AI, but goals aren't quite
link |
01:54:38.320
the same thing as our mortality. It feels like, first of all, humans don't have a goal, and they
link |
01:54:46.560
just kind of create goals at different levels. They make up goals because we're terrified by
link |
01:54:54.240
the mystery of the thing that gets us all. We make these goals up. We're like a goal generation
link |
01:55:02.880
machine, as opposed to a machine which optimizes the trajectory towards a singular goal. It feels
link |
01:55:10.880
like that's an important part of cognition, that whole mortality thing. Well, it is a part of human
link |
01:55:18.480
cognition, but there is no reason for that mortality to come to the equation for an artificial
link |
01:55:30.080
system, because we can copy the artificial system. The problem with humans is that I can't clone
link |
01:55:36.800
you. Even if I clone you as the hardware, your experience that was stored in your brain,
link |
01:55:45.760
your episodic memory, all those will not be captured in the new clone. But that's not the
link |
01:55:52.880
same with an AI system. But it's also possible that the thing that you mentioned with us humans
link |
01:56:02.320
is actually of fundamental importance for intelligence. The fact that you can copy an AI
link |
01:56:07.760
system means that that AI system is not yet an AGI. If you look at existence proof, if we reason
link |
01:56:18.240
based on existence proof, you could say that it doesn't feel like death is a fundamental property
link |
01:56:24.080
of an intelligent system. But we don't yet. Give me an example of an immortal intelligent being.
link |
01:56:33.840
We don't have those. It's very possible that that is a fundamental property of intelligence,
link |
01:56:42.240
is a thing that has a deadline for itself. Well, you can think of it like this. Suppose you invent
link |
01:56:49.840
a way to freeze people for a long time. It's not dying. So you can be frozen and woken up
link |
01:56:58.160
thousands of years from now. So it's no fear of death. Well, no, it's not about time. It's about
link |
01:57:08.000
the knowledge that it's temporary. And that aspect of it, the finiteness of it, I think
link |
01:57:17.120
creates a kind of urgency. Correct. For us, for humans. Yeah, for humans. Yes. And that is part
link |
01:57:23.200
of our drives. And that's why I'm not too worried about AI having motivations to kill all humans
link |
01:57:35.040
and those kinds of things. Why? Just wait. So why do you need to do that? I've never heard that
link |
01:57:43.440
before. That's a good point. Yeah, just murder seems like a lot of work. Let's just wait it out.
link |
01:57:52.560
They'll probably hurt themselves. Let me ask you, people often kind of wonder, world class researchers
link |
01:58:01.440
such as yourself, what kind of books, technical fiction, philosophical, had an impact on you and
link |
01:58:10.320
your life and maybe ones you could possibly recommend that others read? Maybe if you have
link |
01:58:17.920
three books that pop into mind. Yeah. So I definitely liked Judea Pearl's book,
link |
01:58:23.920
Probabilistic Reasoning and Intelligent Systems. It's a very deep technical book. But what I liked
link |
01:58:30.640
is that, so there are many places where you can learn about probabilistic graphical models from.
link |
01:58:36.400
But throughout this book, Judea Pearl kind of sprinkles his philosophical observations and he
link |
01:58:42.960
thinks about, connects us to how the brain thinks and attentions and resources, all those things. So
link |
01:58:48.400
that whole thing makes it more interesting to read. He emphasizes the importance of causality.
link |
01:58:54.400
So that was in his later book. So this was the first book, Probabilistic Reasoning and Intelligent
link |
01:58:58.800
Systems. He mentions causality, but he hadn't really sunk his teeth into causality. But he
link |
01:59:05.040
really sunk his teeth into, how do you actually formalize it? And the second book,
link |
01:59:11.360
Causality, the one in 2000, that one is really hard. So I would recommend that.
link |
01:59:17.840
Yeah. So that looks at the mathematical, his model of...
link |
01:59:22.560
Do calculus.
link |
01:59:23.120
Do calculus. Yeah. It was pretty dense mathematically.
link |
01:59:25.520
Right. The book of Y is definitely more enjoyable.
link |
01:59:28.880
For sure.
link |
01:59:29.360
Yeah. So I would recommend Probabilistic Reasoning and Intelligent Systems.
link |
01:59:34.160
Another book I liked was one from Doug Hofstadter. This was a long time ago. He had a book,
link |
01:59:41.360
I think it was called The Mind's Eye. It was probably Hofstadter and Daniel Dennett together.
link |
01:59:49.200
Yeah. And I actually was, I bought that book. It's on my show. I haven't read it yet,
link |
01:59:54.880
but I couldn't get an electronic version of it, which is annoying because you read everything on
link |
02:00:00.800
Kindle. So you had to actually purchase the physical. It's one of the only physical books
link |
02:00:06.560
I have because anyway, a lot of people recommended it highly. So yeah.
link |
02:00:11.200
And the third one I would definitely recommend reading is, this is not a technical book. It is
link |
02:00:18.720
history. The name of the book, I think, is Bishop's Boys. It's about Wright brothers
link |
02:00:25.040
and their path and how it was... There are multiple books on this topic and all of them
link |
02:00:34.560
are great. It's fascinating how flight was treated as an unsolvable problem. And also,
link |
02:00:46.400
what aspects did people emphasize? People thought, oh, it is all about
link |
02:00:51.520
just powerful engines. You just need to have powerful lightweight engines. And so some people
link |
02:01:00.160
thought of it as, how far can we just throw the thing? Just throw it.
link |
02:01:04.000
Like a catapult.
link |
02:01:05.040
Yeah. So it's very fascinating. And even after they made the invention,
link |
02:01:11.520
people are not believing it.
link |
02:01:13.040
Ah, the social aspect of it.
link |
02:01:15.360
The social aspect. It's very fascinating.
link |
02:01:18.240
I mean, do you draw any parallels between birds fly? So there's the natural approach to flight
link |
02:01:28.320
and then there's the engineered approach. Do you see the same kind of thing with the brain
link |
02:01:33.920
and our trying to engineer intelligence?
link |
02:01:37.280
Yeah. It's a good analogy to have. Of course, all analogies have their limits.
link |
02:01:43.920
So people in AI often use airplanes as an example of, hey, we didn't learn anything from birds.
link |
02:01:55.120
But the funny thing is that, and the saying is, airplanes don't flap wings. This is what they
link |
02:02:02.560
say. The funny thing and the ironic thing is that you don't need to flap to fly is something
link |
02:02:09.520
Wright brothers found by observing birds. So they have in their notebook, in some of these books,
link |
02:02:18.640
they show their notebook drawings. They make detailed notes about buzzards just soaring over
link |
02:02:26.240
thermals. And they basically say, look, flapping is not the important, propulsion is not the
link |
02:02:31.440
important problem to solve here. We want to solve control. And once you solve control,
link |
02:02:37.120
propulsion will fall into place. All of these are people, they realize this by observing birds.
link |
02:02:44.400
Beautifully put. That's actually brilliant because people do use that analogy a lot. I'm
link |
02:02:49.280
going to have to remember that one. Do you have advice for people interested in artificial
link |
02:02:54.480
intelligence like young folks today? I talk to undergraduate students all the time,
link |
02:02:59.200
interested in neuroscience, interested in understanding how the brain works. Is there
link |
02:03:03.840
advice you would give them about their career, maybe about their life in general?
link |
02:03:09.520
Sure. I think every piece of advice should be taken with a pinch of salt, of course,
link |
02:03:14.720
because each person is different, their motivations are different. But I can definitely
link |
02:03:20.400
say if your goal is to understand the brain from the angle of wanting to build one, then
link |
02:03:28.480
being an experimental neuroscientist might not be the way to go about it. A better way to pursue it
link |
02:03:36.240
might be through computer science, electrical engineering, machine learning, and AI. And of
link |
02:03:42.560
course, you have to study the neuroscience, but that you can do on your own. If you're more
link |
02:03:48.800
attracted by finding something intriguing about, discovering something intriguing about the brain,
link |
02:03:53.680
then of course, it is better to be an experimentalist. So find that motivation,
link |
02:03:58.480
what are you intrigued by? And of course, find your strengths too. Some people are very good
link |
02:04:03.120
experimentalists and they enjoy doing that. And it's interesting to see which department,
link |
02:04:10.160
if you're picking in terms of your education path, whether to go with like, at MIT, it's
link |
02:04:18.880
brain and computer, no, it'd be CS. Yeah. Brain and cognitive sciences, yeah. Or the CS side of
link |
02:04:29.120
things. And actually the brain folks, the neuroscience folks are more and more now
link |
02:04:34.240
embracing of learning TensorFlow and PyTorch, right? They see the power of trying to engineer
link |
02:04:44.400
ideas that they get from the brain into, and then explore how those could be used to create
link |
02:04:52.720
intelligent systems. So that might be the right department actually. Yeah. So this was a question
link |
02:04:58.640
in one of the Redwood Neuroscience Institute workshops that Jeff Hawkins organized almost 10
link |
02:05:06.160
years ago. This question was put to a panel, right? What should be the undergrad major you should
link |
02:05:11.040
take if you want to understand the brain? And the majority opinion in that one was electrical
link |
02:05:17.200
engineering. Interesting. Because, I mean, I'm a double undergrad, so I got lucky in that way.
link |
02:05:25.040
But I think it does have some of the right ingredients because you learn about circuits.
link |
02:05:30.080
You learn about how you can construct circuits to approach, do functions. You learn about
link |
02:05:37.920
microprocessors. You learn information theory. You learn signal processing. You learn continuous
link |
02:05:43.040
math. So in that way, it's a good step. If you want to go to computer science or neuroscience,
link |
02:05:50.880
it's a good step. The downside, you're more likely to be forced to use MATLAB.
link |
02:05:56.640
You're more likely to be forced to use MATLAB. So one of the interesting things about, I mean,
link |
02:06:07.920
this is changing. The world is changing. But certain departments lagged on the programming
link |
02:06:13.840
side of things, on developing good habits in terms of software engineering. But I think that's more
link |
02:06:19.280
and more changing. And students can take that into their own hands, like learn to program. I feel
link |
02:06:26.000
like everybody should learn to program because it, like everyone in the sciences, because it
link |
02:06:34.800
empowers, it puts the data at your fingertips. So you can organize it. You can find all kinds of
link |
02:06:40.400
things in the data. And then you can also, for the appropriate sciences, build systems that,
link |
02:06:46.240
like based on that. So like then engineer intelligent systems.
link |
02:06:49.760
We already talked about mortality. So we hit a ridiculous point. But let me ask you,
link |
02:07:04.800
one of the things about intelligence is it's goal driven. And you study the brain. So the question
link |
02:07:13.200
is like, what's the goal that the brain is operating under? What's the meaning of it all
link |
02:07:17.360
for us humans in your view? What's the meaning of life? The meaning of life is whatever you
link |
02:07:23.920
construct out of it. It's completely open. It's open. So there's nothing, like you mentioned,
link |
02:07:31.760
you like constraints. So it's wide open. Is there some useful aspect that you think about in terms
link |
02:07:42.000
of like the openness of it and just the basic mechanisms of generating goals in studying
link |
02:07:50.480
cognition in the brain that you think about? Or is it just about, because everything we've talked
link |
02:07:56.640
about kind of the perception system is to understand the environment. That's like to be
link |
02:08:00.640
able to like not die, like not fall over and like be able to, you don't think we need to
link |
02:08:09.360
think about anything bigger than that. Yeah, I think so, because it's basically being able to
link |
02:08:16.160
understand the machinery of the world such that you can pursue whatever goals you want.
link |
02:08:21.600
So the machinery of the world is really ultimately what we should be striving to understand. The
link |
02:08:26.800
rest is just whatever the heck you want to do or whatever fun you have.
link |
02:08:31.840
One who is culturally popular. I think that's beautifully put. I don't think there's a better
link |
02:08:42.640
way to end it. Dilip, I'm so honored that you show up here and waste your time with me. It's
link |
02:08:49.840
been an awesome conversation. Thanks so much for talking today. Oh, thank you so much. This was
link |
02:08:54.400
so much more fun than I expected. Thank you. Thanks for listening to this conversation with
link |
02:09:00.880
Dilip George. And thank you to our sponsors, Babbel, Raycon Earbuds, and Masterclass. Please
link |
02:09:07.920
consider supporting this podcast by going to babbel.com and use code LEX, going to buyraycon.com
link |
02:09:16.080
and signing up at masterclass.com. Click the links, get the discount. It really is the best
link |
02:09:22.240
way to support this podcast. If you enjoy this thing, subscribe on YouTube, review the Five
link |
02:09:27.440
Stars Napa podcast, support it on Patreon, or connect with me on Twitter at Lex Friedman,
link |
02:09:33.920
spelled yes, without the E, just F R I D M A M. And now let me leave you with some words from Marcus
link |
02:09:43.120
Aurelius. You have power over your mind, not outside events. Realize this and you will find
link |
02:09:51.360
strength. Thank you for listening and hope to see you next time.