back to index

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115


small model | large model

link |
00:00:00.000
The following is a conversation with Dilip George, a researcher at the intersection of
link |
00:00:05.360
neuroscience and artificial intelligence, cofounder of Vicarious with Scott Phoenix,
link |
00:00:10.880
and formerly cofounder of New Menta with Jeff Hawkins, who has been on this podcast, and Dana
link |
00:00:17.280
Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks
link |
00:00:23.520
to today, Dilip's always sought to engineer intelligence that is closely inspired by the
link |
00:00:29.600
human brain. As a side note, I think we understand very little about the fundamental principles
link |
00:00:35.760
underlying the function of the human brain, but the little we do know gives hints that may be
link |
00:00:41.600
more useful for engineering intelligence than any idea in mathematics, computer science, physics,
link |
00:00:46.960
and scientific fields outside of biology. And so the brain is a kind of existence proof that says
link |
00:00:53.120
it's possible, keep at it. I should also say that brain inspired AI is often overhyped and use this
link |
00:01:01.040
fodder just as quantum computing for marketing speak, but I'm not afraid of exploring these
link |
00:01:08.000
sometimes overhyped areas since where there's smoke, there's sometimes fire.
link |
00:01:13.680
Quick summary of the ads, three sponsors Babel, Raycon earbuds, and Masterclass. Please consider
link |
00:01:20.480
supporting this podcast by clicking the special links in the description to get the discount.
link |
00:01:25.760
It really is the best way to support this podcast. If you enjoy this thing, subscribe on
link |
00:01:30.880
YouTube, review with five stars and up a podcast, support on Patreon or connect with me on Twitter
link |
00:01:36.400
and Lex Friedman. As usual, I'll do a few minutes of ads now and never any ads in the middle that
link |
00:01:42.480
can break the flow of the conversation. This show is sponsored by Babel, an app and website
link |
00:01:48.880
that gets you speaking in a new language within weeks. Go to Babel.com and use code Lex to get
link |
00:01:54.240
three months free. They offer 14 languages, including Spanish, French, Italian, German,
link |
00:02:00.800
and yes, Russian. Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100
link |
00:02:08.640
language experts. Let me read a few lines from the Russian poem, Notch, Ulitsa, Fanar Abteka
link |
00:02:16.480
by Alexander Blok, that you'll start to understand if you sign up to Babel. Notch, Ulitsa, Fanar Abteka,
link |
00:02:35.520
I say that you'll only start to understand this poem because Russian starts with a language
link |
00:02:41.440
and ends with the vodka. Now the latter part is definitely not endorsed or provided by Babel
link |
00:02:47.600
and will probably lose me this sponsorship. But once you graduate from Babel, you can enroll my
link |
00:02:52.640
advanced course of late night Russian conversation over vodka. I have not yet developed enough for
link |
00:02:58.000
that. It's in progress. So get started by visiting Babel.com and use code Lex to get three months free.
link |
00:03:05.280
This show is sponsored by Raycon earbuds. Get them at buyraycon.com slash Lex. They become my main
link |
00:03:12.080
method of listening to podcasts, audio books and music when I run, do pushups and pull ups, or just
link |
00:03:18.000
living life. In fact, I often listen to brown noise with them. When I'm thinking deeply about
link |
00:03:23.520
something, it helps me focus. They're super comfortable, pair easily, great sound, great bass,
link |
00:03:30.480
six hours of playtime. I've been putting in a lot of miles to get ready for a potential ultramarathon
link |
00:03:36.720
and listening to audio books on World War two. The sound is rich and really comes in clear. So again,
link |
00:03:45.200
get them at buyraycon.com slash Lex. This show is sponsored by masterclass. Sign up at masterclass.com
link |
00:03:52.800
slash Lex to get a discount and to support this podcast. When I first heard about masterclass,
link |
00:03:58.480
I thought it was too good to be true. I still think it's too good to be true. For 180 bucks a year,
link |
00:04:04.160
you get an all access pass to watch courses from, to list some of my favorites, Chris Hadfield on
link |
00:04:10.160
Space Exploration, Neil deGrasse Tyson on Scientific Thinking and Communication, Will Wright,
link |
00:04:15.600
creator of SimCity and Sims on Game Design. Every time I do this read, I really want to play
link |
00:04:21.440
a city builder game. Carlos Santana on Guitar, Garry Kasparov on Chess, Daniel Lagrano on Poker
link |
00:04:28.800
and many more. Chris Hadfield explaining how rockets work and the experience of being launched into
link |
00:04:34.400
space alone is worth the money. By the way, you can watch it on basically any device. Once again,
link |
00:04:40.640
sign up at masterclass.com to get a discount and to support this podcast. And now here's my
link |
00:04:46.320
conversation with Dileep George. Do you think we need to understand the brain in order to build it?
link |
00:04:53.600
Yes, if you want to build the brain, we definitely need to understand how it works. So Blue Brain
link |
00:05:00.240
or Henry Markrum's project is trying to build the brain without understanding it, like, you know,
link |
00:05:05.520
just trying to put details of the brain from neuroscience experiments into a giant simulation
link |
00:05:14.480
by putting more and more neurons, more and more details. But that is not going to work because
link |
00:05:22.880
when it doesn't perform as what you expect it to do, then what do you do? You just keep adding
link |
00:05:29.040
more details. How do you debug it? So unless you understand, unless you have a theory about
link |
00:05:35.680
how the system is supposed to work, how the pieces are supposed to fit together,
link |
00:05:39.200
what they're going to contribute, you can't build it.
link |
00:05:42.160
At the functional level, understand. So can you actually linger on and describe the Blue Brain
link |
00:05:47.760
project? It's kind of a fascinating principle and idea to try to simulate the brain. We're talking
link |
00:05:56.160
about the human brain, right? Right. Human brains and rat brains or cat brains have lots in common
link |
00:06:03.360
that the cortex, the neocortex structure is very similar. So initially, they were trying to
link |
00:06:10.240
to just simulate a cat brain. And to understand the nature of evil.
link |
00:06:17.120
To understand the nature of evil. Or as it happens in most of these simulations,
link |
00:06:23.440
you easily get one thing out, which is oscillations. Yeah, if you simulate a large number of neurons,
link |
00:06:30.720
they oscillate. And you can adjust the parameters and say that, oh, oscillation
link |
00:06:36.240
match the rhythm that we see in the brain, et cetera. Oh, I see. So the idea is,
link |
00:06:43.280
is the simulation at the level of individual neurons? Yeah. So the Blue Brain project,
link |
00:06:49.040
the original idea as proposed was, you put very detailed biophysical neurons,
link |
00:06:58.000
biophysical models of neurons, and you interconnect them according to the
link |
00:07:03.440
the statistics of connections that we have found from real neuroscience experiments.
link |
00:07:08.480
And then turn it on and see what happens. And these neural models are, you know,
link |
00:07:15.360
incredibly complicated in themselves, right? Because these neurons are
link |
00:07:20.560
modeled using this idea called Hodgkin Huxley models, which are about how signals propagate
link |
00:07:27.360
in a cable. And there are active dendrites, all those phenomena, which those phenomena themselves,
link |
00:07:33.920
we don't understand that well. And then we put in connectivity, which is part guesswork,
link |
00:07:40.960
part, you know, observed. And of course, if you do not have any theory about how it is supposed to
link |
00:07:46.000
work, we, you know, we just have to take whatever comes out of it as, okay, this is something
link |
00:07:54.160
interesting. But in your sense, like these models of the way signal travels along, like with the
link |
00:07:59.920
axons and all the basic models, that's, they're too crude. Oh, well, actually, they are pretty
link |
00:08:06.800
detailed and pretty sophisticated. And they do replicate the neural dynamics. If you take a
link |
00:08:15.200
single neuron, and you try to turn on the different channels, the calcium channels and
link |
00:08:22.160
the different receptors, and see what the effect of turning on or off those channels are in the
link |
00:08:30.240
neurons spike output, people have built pretty sophisticated models of that. And they are,
link |
00:08:37.040
I would say, you know, in the regime of correct.
link |
00:08:40.560
Well, see, the correctness, that's interesting, because you mentioned several levels.
link |
00:08:45.680
The correctness is measured by looking some kind of aggregate statistics.
link |
00:08:48.960
It would be more of the, the spiking dynamics of spiking dynamics as soon as you're okay.
link |
00:08:55.760
And, and yeah, these models, because they're, they are going to the level of mechanism,
link |
00:09:00.400
right? So they are basically looking at, okay, what, what is the effect of turning on an ion
link |
00:09:05.680
channel? And, and you can, you can model that using electric circuits in and then so they are
link |
00:09:12.560
modeled. So it is not just a function fitting, it is people are looking at the mechanism underlying
link |
00:09:18.640
it and putting that in terms of electric circuit theory, signal propagation theory and, and modeling
link |
00:09:25.920
that. And so those models are sophisticated, but getting a single neurons model 99% right,
link |
00:09:34.960
does not still tell you how to, you know, it would be the analog of getting a transistor
link |
00:09:42.000
model right and now trying to build a microprocessor. And if you, if you just observe, you know,
link |
00:09:49.120
if you did not understand how a microprocessor works, but you say, oh, I have, I know can model
link |
00:09:54.720
one transistor well, and now I will just try to interconnect the transistors according to whatever
link |
00:10:00.880
I could, you know, guess from the experiments and try to simulate it. Then it is very unlikely
link |
00:10:08.000
that you will produce a functioning microprocessor. You want to, you know, when you want to produce
link |
00:10:13.520
a functioning microprocessor, you want to understand Boolean logic, how does, how do the, the gates
link |
00:10:18.640
work, all those things. And then, you know, understand how do those gates get implemented
link |
00:10:22.880
using transistors. Yeah, there's actually, I remember this reminds me this is a paper,
link |
00:10:26.960
maybe you're familiar with it. I remember going through in a reading group that approaches a
link |
00:10:33.440
microprocessor from a perspective of a neuroscientist. I think it basically, it uses all the tools that
link |
00:10:41.040
we have of neuroscience to try to understand, like as if we just aliens showed up to study
link |
00:10:46.320
computers. Yeah. And to see if those tools could be used to get any kind of sense of how the
link |
00:10:53.120
microprocessor works. I think the final, the takeaway from the, at least this initials exploration is
link |
00:11:00.320
that we're screwed. There's no way that the tools of neuroscience would be able to get us
link |
00:11:05.440
to anything, like not even Boolean logic. I mean, it's just any aspect of the architecture of the
link |
00:11:15.600
function of the processes involved, the clocks, the timing, all that. You can't figure that out
link |
00:11:22.640
from the tools of neuroscience. Yeah. So I'm very familiar with this particular paper. I think it
link |
00:11:28.160
was called, can a neuroscientist understand a microprocessor or something like that?
link |
00:11:35.680
Following the methodology in that paper, even electrical engineer would not understand microprocessors.
link |
00:11:41.360
So I couldn't. So I don't think it is that bad in the sense of saying,
link |
00:11:48.080
neuroscientists do find valuable things by observing the brain. They do find good insights.
link |
00:11:58.320
But those insights cannot be put together just as a simulation. You have to investigate
link |
00:12:06.000
what are the computational underpinnings of those findings? How do all of them fit together from
link |
00:12:13.280
an information processing perspective? You have to, somebody has to painstakingly put those things
link |
00:12:20.080
together and build hypothesis. So I don't want to, this all of neuroscience are saying, oh,
link |
00:12:25.120
they are not finding anything. No, that paper almost went to that level of neuroscientists
link |
00:12:30.880
will never understand. No, that's not true. I think they do find lots of useful things,
link |
00:12:35.840
but it has to be put together in a computational framework.
link |
00:12:38.880
Yeah, I mean, but just the AI systems will be listening to this podcast 100 years from now,
link |
00:12:46.400
and there will probably, there's some nonzero probability they'll find your words laughable.
link |
00:12:52.480
There's like, I remember humans thought they understood something about the brain that are
link |
00:12:56.960
totally clueless. There's a sense about neuroscience that we may be in the very, very early days of
link |
00:13:02.160
understanding the brain. But I mean, that's one perspective in your perspective. How far are we
link |
00:13:11.360
into understanding any aspect of the brain? So the dynamics of the individual neural
link |
00:13:20.720
communication to the how in a collective sense, how they're able to store information,
link |
00:13:29.040
transfer information, how the intelligence that emerges, all that kind of stuff. Where are we
link |
00:13:33.920
on that timeline? Yeah. So timelines are very, very hard to predict. And you can, of course,
link |
00:13:39.360
be wrong. And it can be wrong on either side. We know that when we look back, the first flight
link |
00:13:48.800
was in 1903. In 1900, there was a New York Times article on flying machines that do not
link |
00:13:57.280
fly. And, you know, humans might not fly for another 100 years. That was what that article
link |
00:14:04.160
stated. And so, but no, they flew three years after that. So it is, you know, it's very hard to,
link |
00:14:10.880
so. And on that point, one of the Wright brothers, I think two years before,
link |
00:14:17.120
uh, said that, uh, like he said, like some number like 50 years, he has become convinced that it's,
link |
00:14:26.160
it's, uh, it's impossible. Even during their experimentation. Yeah. Yeah. Yeah. I mean,
link |
00:14:32.240
that's a tribute to when it, that's like the entrepreneurial battle of like depression of
link |
00:14:37.280
going through just like thinking there's this is impossible. But there, yeah, there's something,
link |
00:14:41.920
even the person that's in it is not able to see, uh, estimate correctly. Exactly. But I can,
link |
00:14:48.160
I can tell from the point of, you know, objectively, what are the things that we
link |
00:14:52.480
know about the brain and how that can be used to build AI models, which can then go back and
link |
00:14:58.560
inform how the brain works. Um, so my way of understanding the brain would be to basically
link |
00:15:03.520
say, look at the insights neuroscientists have found, understand that from, uh, a computational
link |
00:15:10.480
angle, information processing angle, build models using that. And then building the, that model,
link |
00:15:17.760
which, which functions with them, which is a functional model, which is, which is doing the
link |
00:15:21.520
task that we want the model to do. It is not just trying to model a phenomena in the brain. It is,
link |
00:15:26.960
it is trying to do what the brain is trying to do on the, on the whole, uh, functional level.
link |
00:15:32.240
And building that model will help you fill in the missing pieces that, you know, biology just
link |
00:15:37.920
gives you the hints and building the model, you know, fills in the rest of the pieces of the puzzle.
link |
00:15:44.560
And then you can go and connect that back to biology and say, okay, now it makes sense that
link |
00:15:49.760
this part of the brain is, uh, doing this or this layer in the cortical circuit is doing this. Uh,
link |
00:15:57.440
and, and, and then continue this iteratively because now that will inform new experiments
link |
00:16:03.920
in neuroscience. And of course, you know, building the model and verifying that in the real world,
link |
00:16:08.800
will you, will also tell you more about does the model actually work? Uh, and you can refine the
link |
00:16:14.240
model, find better ways of putting these neuroscience insights together. So, so I would say it is,
link |
00:16:21.120
it is, you know, it, so neuroscience alone, just from experimentation, will not be able
link |
00:16:27.680
to build a model of the, of the brain, uh, a functional model of the brain. So we, you know,
link |
00:16:32.800
there, there's, uh, lots of efforts, which are very impressive efforts in collecting more and more
link |
00:16:38.800
connectivity data from the brain. Yeah. You know, how, how are the micro circuits of the brain
link |
00:16:44.720
connected with each other? Those are beautiful, by the way. Those are beautiful. Uh, and at the
link |
00:16:50.000
same time, those, those do not itself, um, by themselves convey the story of how does it work.
link |
00:16:56.880
Yeah. Um, and, and somebody has to understand, okay, why are they connected like that? And
link |
00:17:02.080
what, what are those things doing? Uh, and, and we do that by building models in AI using hints
link |
00:17:08.720
from neuroscience and, and repeat the cycle. So what aspect of the brain are useful in this
link |
00:17:16.720
whole endeavor, which by the way, I should say you're, you're both a neuroscientist and, and AI
link |
00:17:21.680
person, I guess the dream is to both understand the brain and to build a GI systems. So you're,
link |
00:17:27.760
you're, it's like an engineer's perspective of trying to understand the brain. So what aspects
link |
00:17:34.640
of the brain, uh, function is speaking, like you said, do you find interesting? Yeah. Quite a lot
link |
00:17:39.680
of things. All right. So one is, um, you know, if you look at the visual cortex, um, uh, and,
link |
00:17:46.400
and, you know, the visual cortex is, is a large part of the brain. Uh, I forgot the exact fraction,
link |
00:17:52.320
but it is, it's a huge part of our brain areas, uh, occupied by just, just vision. Um, so vision,
link |
00:18:00.080
visual cortex is not just a feed forward cascade of neurons. Um, uh, there are a lot more feedback
link |
00:18:07.040
connections in the brain compared to the feed forward connections. And, and it is surprising
link |
00:18:13.040
to the level of detail neuroscientists have actually studied this. If you, if you go into
link |
00:18:17.200
neuroscience literature and poke around and ask, you know, have they studied what will be the effect
link |
00:18:22.960
of poking a neuron in, uh, level IT, uh, in level V1? And, um, have they studied that? Uh, and you
link |
00:18:32.720
will say, yes, they have studied that every part of every possible combination. I mean, it's, it's a,
link |
00:18:39.520
it's not a random exploration at all. It's a very hypothesis driven, right? They, they are very,
link |
00:18:44.400
uh, experimental neuroscientists are very, very systematic in how they probe the brain. Uh,
link |
00:18:49.120
because experiments are very costly to conduct. They take a lot of preparation. They, they need a
link |
00:18:54.400
lot of control. So they, they are very hypothesis driven in how they probe the brain. And, um,
link |
00:18:59.280
often what I find is that when we have a question in, um, in AI, uh, about have, has anybody probed,
link |
00:19:07.280
probed how lateral connections in the brain works? And when you go and read the literature,
link |
00:19:11.840
yes, people have probed it and people have probed it very systematically. And, and
link |
00:19:16.000
they have hypothesis about how those lateral connections are supposedly contributing to
link |
00:19:22.720
visual processing. Uh, but of course they haven't built very, very functional detail models of it.
link |
00:19:28.640
By the way, how do you know studies side to interrupt? Do they, do they stimulate like a neuron
link |
00:19:33.680
in one particular area of the visual cortex and then see how the travel of the signal travels
link |
00:19:38.720
kind of thing? Fascinating, very, very fascinating experiments. Right. So I can,
link |
00:19:42.160
I can give you one example I was impressed with. Um, this is, uh, so before going to that, let me,
link |
00:19:47.120
like, let me give you, uh, uh, you know, a, uh, overview of how the, the layers in the cortex
link |
00:19:52.640
are organized, right? Uh, visual cortex is organized into roughly four hierarchical levels.
link |
00:19:58.320
Okay. So, uh, V1, V2, V4, IT and in V3, uh, well, yeah, there's another pathway.
link |
00:20:06.000
Okay. So there is this, I'm talking about just the object recognition pathway.
link |
00:20:10.400
All right. Cool. And then, um, in V1 itself, um, so it's, there is a very detailed micro
link |
00:20:18.080
circuit in V1 itself that is, that is organization within a level itself. Um, the cortical sheet
link |
00:20:23.440
is organized into, uh, you know, multiple layers and there are columnar structure and, and this,
link |
00:20:29.920
this layer wise and columnar structure is repeated and V1, V2, V4, uh, IT, all of them.
link |
00:20:36.560
Right. Uh, and, and the connections between these layers within a level with, you know,
link |
00:20:42.320
in V1 itself, there are six layers roughly and the connections between them, there is a particular
link |
00:20:47.040
structure to them. Uh, and, um, now, so one example of, uh, an experiment, uh, uh, people did is
link |
00:20:55.520
when I, when you present a stimulus, uh, which is, um, let's say requires, um, separating the
link |
00:21:03.280
foreground from the background of an object. So it is a, it's a textured triangle on a textured
link |
00:21:09.440
background. Uh, and, um, you can check, does the surface settle first or does the contour settle
link |
00:21:16.880
first? Settle? Settle in the sense that the, so when you find, finally form the percept of the,
link |
00:21:25.920
of the, uh, triangle, you understand where the contours of the triangle are and you also know
link |
00:21:32.000
where the inside of the triangle is, right? That's when you form the final percept. Uh, now you can
link |
00:21:37.600
ask, what is the dynamics of forming that final percept? Um, do the, uh, do the neurons, um,
link |
00:21:47.200
first find the edges and converge on where the edges are and then they find the inner surfaces
link |
00:21:54.480
or does it go the other way around? So, so what's the answer? Uh, in this case, it, it turns out that
link |
00:22:00.640
it find first settles on the edges. It converges on the edge hypothesis first and then the, the
link |
00:22:07.600
surfaces are filled in from the edges to the inside. That's fascinating. Uh, and, and the detail to
link |
00:22:14.480
which you can study this, it's, it's amazing that you can actually not only find, um, the temporal
link |
00:22:20.320
dynamics of when this happens. Uh, uh, and then you can also find which layer in the, you know,
link |
00:22:26.080
in V1, which layer is encoding, uh, the edges, which layer is encoding the surfaces and, uh,
link |
00:22:34.000
which layer is encoding the feedback, which layer is encoding the feed forward and what,
link |
00:22:37.920
what's the combination of them that produces the final person. Um, and these kinds of experiments
link |
00:22:43.760
stand out when you try to explain illusions. Uh, one, one example of a favorite illusion of
link |
00:22:49.680
mine is the Kanitsa triangle. I don't know whether you are familiar with this one. So this is, um,
link |
00:22:54.080
uh, this is an example where it's a triangle, uh, but, you know, the, the corners of the,
link |
00:23:00.080
only the corners of the triangle are shown in the stimuli, the stimulus. Uh, so they look
link |
00:23:04.560
like kind of Pacman. Oh, the black Pacman. Yeah. And then you start to see your visual system
link |
00:23:11.200
hallucinates the edges. Yeah. Um, and you can, you know, you, when you look at it, you will see a
link |
00:23:16.080
faint edge, right? And you can go inside the brain and look, you know, do actually neurons
link |
00:23:23.040
signal the presence of this edge. And, and if they signal, how do they do it? Because they are not
link |
00:23:29.600
receiving anything from the input. The input is blank for those neurons, right? Uh, so how do
link |
00:23:36.320
they signal it? When does the signaling happen? You know, does it, you know, so, so if a real
link |
00:23:41.520
contour is present in the input, then the, the neurons immediately signal, okay, there is a,
link |
00:23:47.520
there is an edge here. When, when it is an illusory edge, um, it is clearly not in the input. It is
link |
00:23:53.760
coming from the context. So those neurons fire later and, and you can say that, okay, these are,
link |
00:23:59.840
it's the feedback connections that is causing them to fire. Uh, and, and they happen later and you
link |
00:24:06.320
can find the dynamics of them. So, so these studies are pretty impressive and, and very detailed.
link |
00:24:13.200
So by the way, just, uh, just to step back, you said, uh, that there may be more feedback
link |
00:24:19.520
connections and feed forward connections. Yeah. Uh, first of all, if it's just for like a machine
link |
00:24:24.880
learning folks, I mean, that, that's crazy that there's all these feedback connections. I mean,
link |
00:24:32.000
we often think about, I think, thanks to deep learning, you start to think about, um, the human
link |
00:24:41.280
brain is a kind of feed forward mechanism. Right. Uh, so what the heck are these feedback connections?
link |
00:24:48.560
Yeah. What's their, what's the dynamics or what are we supposed to think about them?
link |
00:24:54.000
Yeah. So this is, this fits into a very beautiful picture about how the brain works, right? Um,
link |
00:24:59.360
so the, the beautiful picture of how the brain works is that our brain is building a model of the
link |
00:25:05.840
world. Uh, I know. So our visual system is building a model of how objects behave in the world. And,
link |
00:25:13.280
and we are constantly projecting that model back onto the world. So what we are seeing is not just
link |
00:25:19.600
a feed forward thing that just gets interpreted in a forward part. We are constantly projecting
link |
00:25:25.280
our expectations onto the world. And, and what the final percept is a combination of what we
link |
00:25:31.280
project onto the world, uh, combined with what the actual sensory input is. Uh,
link |
00:25:36.720
almost like trying to calculate the difference and then trying to interpret the difference.
link |
00:25:40.880
Yeah. It's, it's, um, I wouldn't put it as calculating the difference. It's more like,
link |
00:25:45.360
what is the best explanation for the input stimulus based on the model of the world I have.
link |
00:25:52.320
Got it. Got it. And that's where all the illusions come in. And that's,
link |
00:25:55.840
but that's, that's an incredibly efficient, so, uh, efficient process. So the feedback
link |
00:26:00.560
mechanism, it just helps you constantly. Uh, yeah. So hallucinate how the world should be
link |
00:26:07.280
based on your world model. And then just looking at, uh, if there's novelty, uh, like trying to
link |
00:26:15.120
explain it, hence that's why movement would detect movement really well. There's all these
link |
00:26:20.800
kinds of things. And this is like at all different levels of the cortex you're saying that this
link |
00:26:26.880
happens at the lowest level, the highest level. Yes. Yeah. Feedback connections are more prevalent
link |
00:26:32.240
in everywhere in the cortex. And, and, um, so one way to think about it, and there's a lot of
link |
00:26:37.840
evidence for this is inference. Um, so, you know, so basically if you have a model of the world
link |
00:26:44.080
and when, when some evidence comes in, what you are doing is inference, right? You are trying to
link |
00:26:50.720
now explain this evidence using your model of the world. And this inference includes
link |
00:26:57.760
projecting your model onto the evidence and taking the evidence, uh, back into the model and, and
link |
00:27:04.160
doing an iterative procedure. Um, and, uh, this iterative procedure is what happens
link |
00:27:10.240
using the feed forward feedback propagation. Uh, and feedback affects what you see in the
link |
00:27:15.920
world and you know, it also affects feed forward propagation and examples are everywhere. We,
link |
00:27:21.360
we see these kinds of things everywhere. The idea that there can be multiple competing
link |
00:27:27.520
hypothesis, uh, in our model, trying to explain the same evidence and then you have to kind of
link |
00:27:34.720
make them compete and one hypothesis will explain away the other hypothesis through this competition
link |
00:27:41.360
process. Wait, what? So you have competing models of the world that try to explain,
link |
00:27:48.160
what do you mean by explain away? So this is a classic example in, uh, uh, graphical models,
link |
00:27:54.000
probabilistic models. Um, so if you, what are those? Um, okay. Um, I think it's useful to mention
link |
00:28:02.400
because we'll talk about them more. Yeah. Yeah. So neural networks are one class of machine
link |
00:28:09.280
learning models. Um, you know, you have distributed set of, uh, nodes, which are called the neurons,
link |
00:28:15.280
you know, each one is doing a dot product and you can, you can approximate any function
link |
00:28:18.960
using this, uh, multi level, uh, network of neurons. So that's, uh, uh, a class of models
link |
00:28:24.640
which are useful, useful for function approximation. There is another class of models
link |
00:28:29.280
in machine learning, uh, called probabilistic graphical models. And you can think of them as
link |
00:28:35.280
each node in that model is variable, which is, which is talking about something, you know,
link |
00:28:41.920
it can be a variable representing is, is an edge present in the input or not. Uh, and at the top
link |
00:28:49.440
of the, uh, uh, network, uh, node can be, uh, representing, is there an object present in the
link |
00:28:56.640
world or not? And, and then, so it can, it is, it is another way of encoding knowledge and, um,
link |
00:29:04.960
and then you, once you encode the knowledge, you can, uh, do inference in the right way. You know,
link |
00:29:12.400
what is the best way to, uh, you know, explain some set of evidence using this model that you
link |
00:29:18.720
encoded, you know. So when you encode the model, you are encoding the relationship between these
link |
00:29:23.120
different variables. How is the edge connected to my, uh, the model of the object? How is the
link |
00:29:27.920
surface connected to the model of the object? Um, and then, um, of course, this is a very
link |
00:29:33.120
distributed, complicated model. And inference is how do you explain a piece of evidence when,
link |
00:29:40.160
when a set of stimulus comes in? If somebody tells me there is a 50% probability that there is an edge
link |
00:29:45.200
here in this part of the model, how does that affect my belief on whether I should think that
link |
00:29:51.200
there should be a, is the square percent in the image? So, so this is the process of inference.
link |
00:29:56.880
So one example of inference is having this expiring away effect between multiple causes. So, uh,
link |
00:30:04.000
graphical models can be used to represent causality in the world. Um, so let's say, um, you know,
link |
00:30:11.920
your, uh, alarm, uh, the, uh, at home can be, uh, triggered by a, uh, burglar getting into your
link |
00:30:21.440
house, uh, or it can be triggered by an earthquake. Both, both can be causes of the alarm going off.
link |
00:30:27.920
So now you, you are, you know, you're in your office, you heard burglar alarm going off. You
link |
00:30:33.520
are heading home thinking that there's a burglar. But while driving home, if you hear on the radio
link |
00:30:39.840
that there was an earthquake in the vicinity, now you're hype, you know, uh, strength of evidence
link |
00:30:45.920
for, uh, a burglar getting into their house is diminished because now that, that piece of evidence
link |
00:30:52.240
is explained by the earthquake being present. So if you, if you think about these two causes
link |
00:30:57.680
explaining at lower level, uh, variable, which is alarm, now what we're seeing is that increasing
link |
00:31:03.920
the evidence for some cause, you know, there is evidence coming in from below for alarm being
link |
00:31:09.520
present. And initially it was flowing to a burglar being present, but now since somebody,
link |
00:31:16.000
some, there is side evidence for this other cause, it explains away this evidence and it
link |
00:31:21.360
evidence will now flow to the other cause. This is, you know, two competing causal, uh, things
link |
00:31:26.480
trying to explain the same evidence. And the brain has a similar kind of mechanism for, uh,
link |
00:31:31.360
for doing so. That's kind of interesting. I mean, and that, how's that all encoded in the brain?
link |
00:31:38.400
Like, where's the storage of information? Are we talking just maybe to get it, uh,
link |
00:31:43.600
a little bit more specific? Is it in the hardware of the actual connections? Is it in, uh, chemical
link |
00:31:50.080
communication? Is it electrical communication? Do we, do we know? So this is, you know, a paper
link |
00:31:56.000
that we are bringing out soon. Which one was this? Um, this is the cortical microcircuit paper
link |
00:32:01.200
that I sent you a draft of. Of course this is, uh, a lot of it is still hypothesis. One hypothesis
link |
00:32:06.880
that a, you can think of a cortical column as encoding a, a concept, a concept, you know,
link |
00:32:13.920
think of it as a, uh, a, um, cons, an example of a concept is, um, is an edge present or not,
link |
00:32:22.160
or is, is an object present or not. Okay. So it can, you can think of it as a binary variable,
link |
00:32:27.360
a binary random variable, the presence of an edge or not, or the presence of an object or not.
link |
00:32:32.080
So each cortical column can be thought of as representing that one concept, one variable.
link |
00:32:38.080
And then the connections between these cortical columns are basically encoding the
link |
00:32:43.200
relationship between these random variables. And then there are connections within the
link |
00:32:47.920
cortical column. There are, each cortical column is implemented using multiple layers of neurons
link |
00:32:53.040
with very, very, very rich, um, structure there. You know, there are thousands of neurons in a
link |
00:32:59.200
cortical column, but, but that structure is similar across the different cortical columns.
link |
00:33:04.000
Yeah. Correct. And also these cortical columns collect, connect to a substructure called thalamus
link |
00:33:09.120
in the, uh, you know, so all, all cortical columns to pass through this substructure.
link |
00:33:14.160
So our hypothesis is, is that the connections between the cortical columns implement this,
link |
00:33:20.160
uh, you know, that's where the knowledge is stored about, you know, how these different
link |
00:33:24.640
concepts, concepts connect to each other. And then the, the neurons inside this cortical column
link |
00:33:30.800
and in thalamus in combination implement this, uh, actual computations needed for inference,
link |
00:33:38.000
which includes explaining away and competing between the different, uh, hypothesis.
link |
00:33:43.680
And it is all very, so what is amazing is that, uh, neuroscientists have actually
link |
00:33:49.680
done ex experiments to the tune of showing these things. Uh, they might not be putting
link |
00:33:55.280
it in the overall inference framework, but they will show things like if I poke this higher level
link |
00:34:01.760
neuron, uh, it will inhibit through this complicated loop through the thalamus, it will inhibit this
link |
00:34:07.680
other column. Uh, so they will, they will do such experiments. Do they use terminology of concepts,
link |
00:34:14.080
for example? So, so you're, I mean, uh, is it, uh, is it something where it's easy to anthropomorphize
link |
00:34:23.600
and think about concepts? Like, uh, you started moving into logic based kind of reasoning systems.
link |
00:34:29.760
So, um, are we just thinking of concepts in that kind of way, or is it, uh, is it a lot messier,
link |
00:34:37.920
a lot more gray area, you know, even, even more gray, even more messy than, uh, the artificial
link |
00:34:45.600
neural network kinds, kinds of abstractions. It's easiest way to think of it as a variable,
link |
00:34:50.480
right? It's a binary variable, which is showing the presence or absence of something. But I guess
link |
00:34:56.240
what I'm asking is, is that something, uh, we're supposed to think of something that's human
link |
00:35:02.400
interpretable of that something. It doesn't need to be. It doesn't need to be human interpretable.
link |
00:35:06.960
There's no need for it to be human interpretable. Uh, but it's, it's almost like, um,
link |
00:35:13.760
you, you will be able to find some interpretation of it, uh, because it is connected to the other
link |
00:35:20.320
things that you know about. And the point is it's useful somehow. It's useful as an entity
link |
00:35:28.560
in the graph that, in connecting to the other entities that are, let's call them concepts.
link |
00:35:34.160
Right. Okay. So, uh, by the way, what's, are these the cortical micro circuits?
link |
00:35:39.840
Correct. These are the cortical micro circuits. You know, that's what neuroscientists use to
link |
00:35:44.000
talk about the circuits in, in, uh, within a level of the cortex. So you can think of,
link |
00:35:49.920
you know, let's think of in neural network, you know, artificial neural network terms,
link |
00:35:54.160
you know, people talk about the architecture of though, you know, so how many, how many layers
link |
00:35:58.560
they build, uh, you know, what is the fan in fan out, et cetera. That is the macro architecture.
link |
00:36:03.120
Um, so, and then within a layer of the neural network, you can, you know, the cortical neural
link |
00:36:11.280
network is much more structured within, you know, within a level, there's a lot more intricate
link |
00:36:15.920
structure there. Uh, but even, um, even within an artificial neural network, you can think of
link |
00:36:21.200
in feature detection plus pooling as one, one level. And so that is kind of a micro circuit.
link |
00:36:26.480
Uh, it's much more, uh, complex in the real brain. Uh, and, uh, and so within a level,
link |
00:36:34.160
whatever is that circuitry within a column of the cortex and between the layers of the
link |
00:36:38.640
cortex, that's the micro circuitry. I love that terminology. Uh, machine learning people don't
link |
00:36:43.600
use the circuit terminology, but they should. It's, uh, it's a nice. So okay. Uh, okay. So that's, uh,
link |
00:36:50.720
uh, that's the, the, the cortical micro circuit. So what's interesting about, uh, what, what can
link |
00:36:56.720
we say? What is the paper that, uh, you're working on, uh, propose about the ideas around
link |
00:37:02.640
these cortical micro circuits. So this is a fully functional model for the micro circuits of the
link |
00:37:09.680
visual cortex. So the, the paper focuses on your idea and our discussions now is focusing on vision.
link |
00:37:15.040
Yeah. The, uh, visual cortex. Okay. Yeah. This is a model. This is a full model. This is,
link |
00:37:21.040
this is how vision works. But this is, this is a, yeah, model.
link |
00:37:26.400
Okay. So let me, let me step back a bit. Um, so we looked at neuroscience for insights on
link |
00:37:33.200
how to build a vision model, right? And, and, and we synthesized all those insights into a
link |
00:37:38.800
computational model. This is called the recursive cortical network model that we, we used for
link |
00:37:43.840
breaking captures and, and we are using the same model for robotic picking and, uh, tracking of
link |
00:37:50.320
objects. And that again is a vision system. That's a vision system. Computer vision system.
link |
00:37:54.400
That's a computer vision system. Takes in images and outputs. What? On one side, it outputs the
link |
00:38:00.480
class of the image, uh, and also segments the image. Uh, and you can also ask it further queries.
link |
00:38:07.280
Where is the edge of the object? Where is the interior of the object? So, so it's a, it's a
link |
00:38:11.600
model that you build to answer multiple questions. So you're not trying to build a model for just
link |
00:38:17.120
classification or just segmentation, et cetera. It's a, it's a, it's a joint model that can do
link |
00:38:22.720
multiple things. Um, and, um, so, so that's the model that we built using insights from neuroscience.
link |
00:38:30.320
And some of those insights are what is the role of feedback connections? What is the role of lateral
link |
00:38:35.120
connections? Uh, so all those things went into the model. The model actually uses feedback connections.
link |
00:38:40.720
All these ideas from, from neuroscience. Uh, so what, what, what the heck is a recursive
link |
00:38:45.600
cortical network? Like what, what are the architecture approaches, interesting aspects here
link |
00:38:51.840
which is essentially a brain inspired approach to a computer vision?
link |
00:38:56.400
Yeah. So there are multiple layers to this question. I can go from the very, very top and
link |
00:39:01.760
then zoom in. Okay. So one important thing constrained that went into the model is that
link |
00:39:07.760
you should not think vision, think of vision as something in isolation. We should not think
link |
00:39:13.520
perception as something as a pre processor for cognition, perception and cognition are interconnected.
link |
00:39:21.600
And so you should not think of one problem in separation from the other problem. Um, and so
link |
00:39:26.800
that means if you finally want to have a system that understand concepts, uh, about the world and
link |
00:39:32.480
can learn in a very conceptual model of the world and can reason and connect to language, all of
link |
00:39:37.840
those things, you need to, you need to have think all the way through and make sure that your
link |
00:39:42.800
perception system is compatible with your cognition system and language system and all of them.
link |
00:39:48.080
And one aspect of that is top down controllability. Um, what does that mean? So that means, you
link |
00:39:55.600
know, so, so think of, you know, you can close your eyes and think about the details of one object.
link |
00:40:02.640
Right. I can, I can zoom in further and further. I can, you know, so, so think of the bottle in
link |
00:40:07.440
front of me. Right. And now you can think about, okay, what the cap of that bottle looks. Uh,
link |
00:40:14.000
I know we can think about what's the texture on that bottle of the, uh, the cap, you know, you can
link |
00:40:19.760
think about, you know, what will happen if, uh, something hits that. Uh, so you can, you can,
link |
00:40:25.520
you can manipulate your visual knowledge in, uh, cognition driven ways. Yes. Uh, and so
link |
00:40:33.520
this top down controllability, uh, and being able to simulate scenarios in the world.
link |
00:40:40.400
So you're not just a passive, uh, player in this perception game. You can, you can control it.
link |
00:40:47.280
You can, you, you have imagination. Correct. Correct. So, so, so basically, you know,
link |
00:40:52.160
basically having a generating network, uh, which is a model and, and it is not just some arbitrary
link |
00:40:57.920
generating network. It has to be, it has to be built in a way that it is controllable top down.
link |
00:41:02.800
It is, it is not just trying to generate a whole picture at once. Uh, you know, it's not trying
link |
00:41:08.000
to generate photorealistic things of the world. You, you know, you don't have good photorealistic
link |
00:41:12.480
models of the world. Human brains do not have. If I, if I, for example, ask you the question, uh,
link |
00:41:17.200
what is the color of the letter E in the Google logo? You have no idea. No idea.
link |
00:41:23.760
Although you have seen it millions of times or hundreds of times. So, uh, so it's not,
link |
00:41:29.280
our model is not photorealistic, but, but it is, but it has other properties that we can manipulate
link |
00:41:34.720
it, uh, in the, uh, and you can think about filling in a different color in that logo.
link |
00:41:39.280
You can think about expanding the, the letter E, you know, you can see what'll in, so you can imagine
link |
00:41:45.360
the consequence of, you know, actions that you have never performed. So, so these are the kind
link |
00:41:50.160
of characteristics the generative model need to have. So this is one constraint that went into
link |
00:41:54.400
our model. Like, you know, so this is when you read the, just the perception side of the paper,
link |
00:41:59.040
it is not obvious that this was a constraint into the, that went into the model, this top
link |
00:42:03.520
down controllability of the generative model. Uh, so what, what does top down controllability in a
link |
00:42:09.440
model look like? It's a really interesting concept, fascinating concept. What is that,
link |
00:42:15.840
is that the recursive, recursiveness gives you that or how do you, how do you do it?
link |
00:42:21.120
Quite a few things. It's like, what, what does the model factor factorize? You know, what are the,
link |
00:42:26.400
what is the model representing as different pieces in the puzzle? Like, you know, so,
link |
00:42:30.480
so in the RCN, uh, network, it, it thinks of the world, you know, so what I say, the background of
link |
00:42:37.440
an image is modeled separately from the foreground of the image. So, so the objects are separate
link |
00:42:43.360
from the background. They're different entities. So there's a kind of segmentation that's built
link |
00:42:47.360
in fundamentally. And, and, and then even that object is composed of parts and also, and another
link |
00:42:54.080
one is the, the shape of the object, uh, is differently modeled from the texture of the object.
link |
00:43:01.920
Got it. So there's like these, um, uh, I've been, you know, who François Chalet is,
link |
00:43:08.400
uh, he's, so there's, uh, he developed this like IQ test type of thing for ARC challenge for,
link |
00:43:16.400
and it's kind of cool that there's, um, these concepts, priors that he defines that you bring
link |
00:43:22.640
to the table in order to be able to reason about basic shapes and things in IQ tests.
link |
00:43:28.560
So here you're making it quite explicit that here, here are the things that you should be,
link |
00:43:34.800
these are like distinct things that you should be able to, um, model in this.
link |
00:43:40.000
Keep in mind that you, you can derive these from much more general principles. It doesn't,
link |
00:43:44.880
you don't need to explicitly put it as, oh, objects versus foreground versus background,
link |
00:43:49.840
uh, the surface versus texture. No, these are, these are derived from, uh, more fundamental
link |
00:43:55.440
principles of how, you know, what's the property of continuity of natural signals.
link |
00:44:02.480
What's the property of continuity of natural signals?
link |
00:44:05.360
Yeah.
link |
00:44:05.760
By the way, that sounds very poetic, but yeah. Uh, so you're saying that's a,
link |
00:44:10.800
there's some low level properties from which emerges the idea that shapes would be different
link |
00:44:15.440
than, like, uh, this should be a part of an object. There should be, I mean, kind of like
link |
00:44:20.560
friends will talk. I mean, there's objectness. There's all these things that it's kind of crazy
link |
00:44:25.840
that we humans, uh, I guess evolved to have because it's useful for us to perceive the world.
link |
00:44:31.920
Yeah. Correct. And it, it derives mostly from the properties of natural signals and, and so, um,
link |
00:44:38.880
natural signals. So natural signals are the kind of things we'll perceive in the, in the natural
link |
00:44:44.320
world. I don't know. I don't know why that sounds so beautiful. Natural signals. Yeah.
link |
00:44:48.560
As opposed to a QR code, right? Which is an artificial signal that we created.
link |
00:44:52.640
Humans are not very good at classifying QR codes. We are very good at saying something is a cat or
link |
00:44:57.600
a dog, but not very good at, you know, we're classifying, whether computers are very good at
link |
00:45:01.920
classifying QR codes. Um, so our, our visual system is tuned for natural signals. Uh, and
link |
00:45:08.480
there are fundamental assumptions in the architecture that are derived from natural
link |
00:45:12.800
signals, uh, properties. I wonder when you take, uh, hosts and genetic drugs, does that go into
link |
00:45:19.040
natural or is that closer to the QR code? It's still natural. It's still natural. Yeah. Because
link |
00:45:25.120
it's, it is still operating using our brains. By the way, on that, on that topic, I, I mean,
link |
00:45:30.000
I haven't been following. I think they're becoming legalized and certain. I can't wait until they
link |
00:45:34.800
become legalized to a degree that you like vision science researchers could study it. Yeah. Just
link |
00:45:40.640
like through, through medical chemical ways, modify. There could be ethical concerns, but
link |
00:45:47.600
modify. That's another way to study the brain is to be, be able to chemically modify it. It's
link |
00:45:53.280
probably, um, probably very long a way to, to figure out how to do it ethically. Yeah. But I,
link |
00:45:59.840
I think there are studies on that already. Yeah. I think so. Uh, because it's, it's not unethical
link |
00:46:05.520
to give, uh, it to rats. Oh, that's true. That's true. There's a lot, there's a lot of
link |
00:46:13.040
shrugged up rats out there. Okay. Cool. Sorry. Sorry to, it's okay. So there's, uh, so there's
link |
00:46:18.480
these, uh, uh, low level, uh, things from natural signals that, uh, that, that can,
link |
00:46:26.480
from which these properties will emerge. Yes. Uh, but it is still a very hard problem on how to
link |
00:46:33.360
encode that. You know, so you don't, you know, there is no, uh, so, uh, you mentioned, um, the,
link |
00:46:38.960
the, the priors, uh, Franchot wanted to encode in, uh, in the, uh, abstract reasoning challenge,
link |
00:46:44.960
but it is not straightforward how to encode those priors. Um, so, so some of those, uh, challenges
link |
00:46:51.040
like, you know, the objective completion challenges are things that we purely use our
link |
00:46:56.240
visual system to do. It is, uh, it looks like abstract reasoning, but it is purely an output
link |
00:47:00.480
of a, uh, the, the vision system. For example, completing the corners of that connoisseur triangle,
link |
00:47:05.280
completing the lines of that connoisseur triangle. It's a purely a visual system property. You know,
link |
00:47:09.120
it's not, there is no abstract reasoning involved. It uses all these priors, but it is stored in our
link |
00:47:14.800
visual system in a particular way that is amenable to inference. And, and, and that is one of the
link |
00:47:21.920
things that we tackled in the, you know, basically saying, okay, these are the prior knowledge, uh,
link |
00:47:27.200
which, which will be derived from the word, but then how is that prior knowledge represented
link |
00:47:32.400
in the model such that inference when, when some piece of evidence comes in can be done very
link |
00:47:38.560
efficiently and in a very distributed way. Um, because it is very, there are so many ways of
link |
00:47:44.160
representing knowledge, which is not amenable to very quick inference, you know, quick lookups.
link |
00:47:50.000
Uh, and so that's one, um, core part of what we tackled in, uh, the RCN model. Um, uh,
link |
00:47:58.320
how do you encode visual knowledge to, uh, do very quick inference and yeah.
link |
00:48:02.800
Can you maybe comment on, uh, so folks listening to this and in general may be familiar with
link |
00:48:08.560
different kinds of architectures of a neural networks. What, what are we talking about with
link |
00:48:13.680
the RCN? Uh, what, what does, what does the architecture look like? What are different
link |
00:48:17.760
components? Is it close to neural networks? Is it far away from neural networks? What does it
link |
00:48:22.080
look like? Yeah. So, so you can, uh, think of the delta between the model and a convolutional
link |
00:48:27.920
neural network if, if people are familiar with convolutional networks. So convolutional networks
link |
00:48:32.720
have this feed forward processing cascade, which is called, uh, feature detectors and pooling.
link |
00:48:38.480
And that is repeated in the, in the hierarchy in a, in a, uh, multi level, uh, system. Um, and
link |
00:48:44.320
if you, if you want to, an intuitive idea of what, what is happening, feature detectors are,
link |
00:48:49.440
uh, you know, detecting interesting co occurrences in the input. It can be a line, a corner,
link |
00:48:56.640
a, an eye or a piece of texture, et cetera. And the pooling neurons are doing some local
link |
00:49:04.560
transformation of that and making it invariant to local transformations. So this is what the
link |
00:49:09.360
structure of convolutional neural network is. Um, recursive cortical network has a similar structure
link |
00:49:16.400
when you look at just the feed forward pathway. But in addition to that, it is also structured in
link |
00:49:21.200
a way that it is generative so that it can run it backward and combine the forward with the backward.
link |
00:49:28.400
Another aspect that it has is it has lateral connections. These lateral connections,
link |
00:49:35.440
um, which is between, so if you have an edge here and an edge here, it has connections between
link |
00:49:41.280
these edges. It is not just feed forward connections. It is, um, something between these edges,
link |
00:49:47.280
which is, uh, the, the nodes representing these edges, which is to enforce compatibility between
link |
00:49:52.160
them. So otherwise what will happen is that constraints, it's a constraint. It's basically,
link |
00:49:56.800
if you, if you do just feature detection followed by pooling, then your, your transformations in
link |
00:50:03.760
different parts of the visual field are not coordinated. Uh, and so you can, you will create a
link |
00:50:09.680
jagged, when you, when you generate from the model, you will create jagged, um, uh, things and
link |
00:50:15.200
uncoordinated transformations. So these lateral connections are enforcing the, the transformations.
link |
00:50:21.840
Is the whole thing still differentiable? Uh, no. Okay. No. It's not, it's not trade using, uh,
link |
00:50:28.960
back prop. Okay. That's really important. So, uh, so there's these feed forward, there's feedback
link |
00:50:34.560
mechanisms. There's some interesting connectivity things. It's still layered like, uh, uh,
link |
00:50:40.160
multiple layers. Okay. Very, very interesting. Uh, and yeah. Okay. So the interconnection
link |
00:50:47.360
between, um, adjacent, the connections across service constraints that they keep the thing
link |
00:50:54.400
stable. Correct. Okay. So what, what else? Uh, and then there's this idea of doing inference.
link |
00:51:01.120
A neural network does not do inference on the fly. So an example of why this inference is
link |
00:51:08.160
important is, you know, so one of the first applications, uh, that we showed in the paper
link |
00:51:13.440
was to crack, uh, text based captures. What are captures by the way? Uh, yeah. By the way,
link |
00:51:20.960
one of the most awesome, like the people don't use this term anymore as human computation, I think.
link |
00:51:26.320
Uh, I love this term. The guy who created captures, I think came up with this term.
link |
00:51:30.720
Yeah. I love it. Anyway. Uh, yeah. Uh, what, what are captures? So captures are those strings
link |
00:51:38.640
that you fill in, uh, when you're, you know, when, if you're opening a new account in Google,
link |
00:51:43.200
they show you a picture. Hey, you know, usually it used to be set of garbled letters, uh, that you
link |
00:51:48.880
have to kind of, uh, figure out what, what, what is that string of characters and type in. And the
link |
00:51:53.840
reason captures exist is because, you know, um, Google or Twitter do not want automatic creation
link |
00:52:01.920
of accounts. You can use a computer to create millions of accounts, uh, and, uh, use that for
link |
00:52:08.800
in nefarious purposes. Uh, so you want to make sure that to the extent possible, the interaction
link |
00:52:14.560
that, uh, their system is having is with a human. So it's a, it's called a human interaction proof.
link |
00:52:20.800
A capture is a human interaction proof. Um, so, so this is a captures are by design,
link |
00:52:27.280
things that are easy for humans to solve, but hard for computer. Hard for robots. Yeah. Um, so,
link |
00:52:33.440
and text based captures, well, was the one which is prevalent and around 2014 because at that time
link |
00:52:41.040
text based risk captures were hard for computers to crack. Even now they are actually in the sense of
link |
00:52:47.600
an arbitrary text based capture will be unsolvable even now. But with the techniques that we have
link |
00:52:53.520
developed, it can be, you know, you can quickly develop a mechanism that solves the capture.
link |
00:52:58.880
They've probably gotten a lot harder too. They've been getting clever and clever
link |
00:53:03.760
generating these text. Yeah. So okay. So that was one of the things you've tested on is these
link |
00:53:09.680
kinds of captures in 2014, 15, that kind of stuff. So what, uh, well, I mean, why, by the way, why
link |
00:53:17.680
captures? Yeah. Yeah. Even now I would say capture is a very, very good challenge problem. Uh, if you
link |
00:53:24.560
want to understand how human perception works and if you want to build, uh, systems that work
link |
00:53:30.400
like the human brain. Uh, and I wouldn't say capture is a solved problem. We have cracked
link |
00:53:35.680
the fundamental defense of captures, but it is not solved in the way that humans solve it.
link |
00:53:41.280
Um, so I can give you an example. I can, um, take a five year old child who has just learned
link |
00:53:47.280
characters, uh, and, uh, show them any new capture that we create, they will be able to solve it.
link |
00:53:55.360
Uh, I can show you pretty much any new capture from any new website. You'll be able to solve it
link |
00:54:01.440
without getting any training examples from that particular style of capture. You're assuming
link |
00:54:06.320
I'm human. Yeah. Yes. Yeah. Uh, that's right. So if you are human, otherwise I will be able to figure
link |
00:54:13.120
that out using this one, but, uh, this whole podcast is just a touring test. That's a long,
link |
00:54:19.120
a long touring test. Anyway, I'm sorry. So yeah. So she's human, humans can figure it out with very
link |
00:54:24.960
few examples or no training examples, like no training examples from that particular style of
link |
00:54:30.080
capture. Um, and, and so you can, you know, so, uh, even now this is unreachable for, uh, the current
link |
00:54:37.360
deep learning system. So basically there is no, I don't think a system exists where you can basically
link |
00:54:41.680
say train on whatever you want. And then now say, Hey, I will show you a new capture, which I did
link |
00:54:48.240
not show you in, in the, in the training setup. Will the system be able to solve it? Um, it still
link |
00:54:53.920
doesn't exist. So that is the magic of human perception. Yeah. And Doug Hofstadter, uh, put
link |
00:55:00.640
this, uh, very beautifully in, uh, one of his, uh, talks, the, the central problem in AI is what is
link |
00:55:08.560
the letter A. If you can, if you can build a system that reliably can detect all the variations of
link |
00:55:16.080
the letter A, you don't even need to go to the, the, the V and the C. Yeah. You don't even know
link |
00:55:21.600
to go to the V and the C or the strings of characters. And, uh, so that, that is the spirit
link |
00:55:26.320
at which, you know, with, which we, uh, tackle that problem. What does it mean by that? I mean,
link |
00:55:30.320
is, is it, uh, like without training examples, try to figure out the fundamental, uh, elements
link |
00:55:38.880
that make up the letter A in all of its forms. In all of its forms, it can be, A can be made
link |
00:55:45.760
with the two humans standing, leaning against each other, holding the hands and, uh, it can
link |
00:55:50.560
be made of leaves. It can be. Yeah. You might have to understand, uh, everything about this world
link |
00:55:55.520
in order to understand letter A. Yeah. So it's common sense reasoning, essentially. Yeah. Right.
link |
00:56:00.880
So, so to finally, to really solve, finally to say that you have solved capture, uh, you have
link |
00:56:08.080
to solve the whole problem. Yeah. Okay. So what, how does, uh, this kind of the RCN architecture
link |
00:56:15.680
help us to get, uh, do a better job of that kind of thing? Yeah. So, uh, as I mentioned, one of
link |
00:56:21.600
the important things was being able to do inference, being able to dynamically do inference. Can you,
link |
00:56:27.440
can you, uh, can you, uh, clarify what you mean? Cause could you say like neural networks don't do
link |
00:56:32.400
inference? Yeah. So what do you mean by inference in this context then? So, okay. So in captures,
link |
00:56:38.160
what they do to confuse people is to make these characters crowd together. Yes. Okay. And when
link |
00:56:44.480
you make the characters crowd together, what happens is that you will now start seeing
link |
00:56:49.120
combinations of characters as some other new character or, or an existing character. So you
link |
00:56:53.920
would, you would put an R and N together. It will start looking like an M. Uh, and, and so locally,
link |
00:57:00.720
they are, you know, there, there is very strong evidence for it being, uh, some, uh, incorrect
link |
00:57:07.600
character. But globally, the only explanation that fits together is something that is different
link |
00:57:14.080
from what you find locally. Yes. So, so, so this is inference. You are basically taking, uh, local
link |
00:57:21.200
evidence and putting it in the global context and often coming to a conclusion locally, which is
link |
00:57:28.240
conflicting with the local information. So actually, so you mean inference, like, uh, in the way it's
link |
00:57:34.480
used to, when you talk about reasoning, for example, uh, as opposed to like inference, which is this
link |
00:57:39.840
with, you know, with artificial neural networks, which is a single pass to the network. Okay.
link |
00:57:44.480
Okay. So like you're basically doing some basic forms of reasoning, like integration of like,
link |
00:57:51.120
how local things fit into the, the global picture. And, and, and things like explaining
link |
00:57:55.760
a way coming into this one, because you are, you are, uh, explaining that piece of evidence,
link |
00:58:00.880
uh, as something else, uh, because globally, that's the only thing that makes sense. Um,
link |
00:58:06.080
so now, uh, you can amortize this inference by, you know, in a neural network, if you want to do
link |
00:58:13.680
this, what do you, you can, you can brute force it. You can just show it all combinations of things,
link |
00:58:19.280
that you want to, you want to, uh, your reasoning to work over and you can, you know, like just
link |
00:58:25.280
train the help out of that neural network and it will look like it is doing, uh, you know,
link |
00:58:30.400
inference on the fly, but it is, it is really just doing amortized inference. It is because you,
link |
00:58:36.080
you have shown it a lot of these combinations during training time. Um, so what you want to do is
link |
00:58:42.640
be able to do dynamic inference rather than just being able to show all those combinations
link |
00:58:47.280
in the training time. And that's something we emphasized in the model.
link |
00:58:51.760
What does it mean dynamic inference? Is that, that has to do with the feedback thing?
link |
00:58:56.160
Yes. Like what, what is dynamic? I'm trying to visualize what dynamic inference would be in
link |
00:59:01.840
this case. Like what is it doing with the input? It's shown the input the first time.
link |
00:59:07.680
Yeah. And is, is like, what's changing over temporarily over? What's the dynamics of this
link |
00:59:13.840
inference process? So, so you can think of it as you have, um, at the top of the model,
link |
00:59:18.800
the characters that you are trained on, they are the causes that you're trying to explain the pixels
link |
00:59:25.040
using the characters as the causes. The, you know, the characters are the things that cause the pixels.
link |
00:59:32.960
Yeah. So there's this causality thing. So the reason you mentioned causality, I guess,
link |
00:59:37.600
is because there's a temporal aspect of this whole thing.
link |
00:59:40.720
In this particular case, the temporal aspect is not important. It is more like when,
link |
00:59:44.720
if, if I turn the character on the, the pixels will turn on. Yeah. It'll be after there's a
link |
00:59:50.400
little bit, but yeah. So that is the causality in the sense of like a logic causality, like
link |
00:59:55.520
hence inference. Okay. The dynamics is that, uh, even though locally it will look like, okay,
link |
01:00:02.480
this is an a, uh, and, and locally just when I look at just that patch of the image,
link |
01:00:08.960
it looks like an a, but when I look at it in the context of all the other causes,
link |
01:00:14.320
it might not, you know, a is not the something that makes sense. So that is something you have to
link |
01:00:18.240
kind of, you know, recursively figure out. Yeah. So, okay. So, uh, and, uh, this thing performed
link |
01:00:24.720
pretty well on the captures. Correct. And, uh, I mean, is there some kind of interesting intuition
link |
01:00:32.000
you can provide? Why did well, like what did it look like? Is there visualizations that
link |
01:00:37.280
could be human interpretable to us humans? Yes. Yeah. So the good thing about the model is that
link |
01:00:42.240
it is extremely, um, so it is not just doing a classification, right? It is, it is, it is, it is
link |
01:00:48.240
providing a full explanation for the scene. So when, when it, when it, uh, operates on a scene,
link |
01:00:55.040
it is coming at back and saying, look, this is the part is the a and these are the pixels that
link |
01:01:01.200
turned on, uh, these are the pixels in the input that tells makes me think that it is an a and
link |
01:01:08.320
also these are the portions I hallucinated. It, you know, it provides a complete explanation
link |
01:01:14.640
of that form. And then these are the contours. These are, this is the interior and this is
link |
01:01:20.720
in front of this other object. So that, that's the kind of, um, explanation it, uh, the inference
link |
01:01:27.040
network provides. So, so that, that is useful and interpretable. Um, and, uh, um, then the kind
link |
01:01:36.400
of errors it makes are also, I don't want to, um, read too much into it, but the kind of errors
link |
01:01:44.240
the network makes are, uh, very similar to the kinds of errors humans would make in a, in a
link |
01:01:49.680
similar situation. So there's something about the structure that's, uh, feels reminiscent of the way
link |
01:01:54.320
humans, uh, visual system works. Well, I mean, uh, how hard coded is this to the capture problem?
link |
01:02:03.200
This idea? Uh, not really hard coded because it's the, uh, the assumptions as I mentioned are
link |
01:02:08.560
general, right? It is more, um, and, and those themselves can be applied in many situations
link |
01:02:14.560
which are natural signals. Um, so it's, it's the foreground versus, uh, background factorization
link |
01:02:20.640
and, uh, the factorization of the surfaces versus the contours. So these are all generally
link |
01:02:26.400
applicable assumptions in all vision. So why, why capture, why attack the capture problem,
link |
01:02:34.240
which is quite unique in the computer vision context versus like the traditional benchmarks
link |
01:02:39.680
of image net and all those kinds of image classification or even segmentation tasks,
link |
01:02:44.880
all that kind of stuff. Do you feel like that's, uh, I mean, what, what's your thinking about
link |
01:02:49.200
those kinds of benchmarks in, um, in this, in this context? I mean, those benchmarks are useful
link |
01:02:55.120
for deep learning kind of algorithms where you, you know, so the, the settings, uh, that deep
link |
01:03:00.720
learning works in our, here is my huge training set and here is my test set. So the, the training
link |
01:03:07.040
set is almost, uh, you know, 100x, 1000x bigger than, uh, the test set in many, many, many cases.
link |
01:03:14.320
What we wanted to do was invert that. The training set is very smaller than the, the test set.
link |
01:03:21.680
Yes. Uh, and, uh, uh, and, you know, uh, capture is a problem that is by definition
link |
01:03:29.760
hard for computers and it has these good properties of strong generalization, strong
link |
01:03:35.840
out of training distribution generalization. If you are interested in studying that, uh, and putting,
link |
01:03:42.480
having your model have that property, then it's a, it's a good data set to tackle.
link |
01:03:46.720
So is there, have you attempted to, which I think I believe there's quite a growing body of work
link |
01:03:53.520
on looking at MNIST and ImageNet without training. So like taking, like the basic challenge is how,
link |
01:04:02.080
what tiny fraction of the training set can we take in order to do a reasonable job
link |
01:04:08.720
of the classification task? Have, have you explored that angle in these classic benchmarks?
link |
01:04:14.960
Yes. So, so we did do MNIST. So, um, you know, so it's not just capture. We, uh, so there was,
link |
01:04:20.240
uh, also, uh, uh, uh, versions of multiple versions of MNIST, including the, the standard
link |
01:04:26.560
version, which, where we inverted the problem, which is basically saying, rather than train on
link |
01:04:30.800
60,000, uh, training data, uh, you know, how, uh, quickly can you get, uh, to high level accuracy
link |
01:04:38.320
with very little training data?
link |
01:04:39.680
Was, is there some, uh, performance that you remember? Like how well, how well did it do?
link |
01:04:45.200
How many examples did it need?
link |
01:04:47.360
Yeah. I, I, you know, I remember that it was, you know, uh, on the order of, uh,
link |
01:04:55.120
tens or hundreds of examples to get into, uh, 95% accuracy. And it was, it was definitely
link |
01:05:00.880
better than the systems, other systems out there at that time.
link |
01:05:03.840
At that time. Yeah.
link |
01:05:04.720
Yeah. They're really pushing it. I think that's a really interesting space, actually.
link |
01:05:08.320
Uh, I think there's an actual name for MNIST that, uh, like there's different names to the
link |
01:05:16.800
different sizes of training sets. I mean, people are like attacking this problem. I think it's
link |
01:05:21.600
super interesting. Yeah.
link |
01:05:22.720
It's funny how like the MNIST will probably be with us all the way to AGI as the data set that
link |
01:05:30.720
just sticks by. It is, it's a clean, simple, uh, data set to, uh, to study the fundamentals
link |
01:05:37.520
of learning with just like CAPTCHAs. It's interesting. Not enough people, I don't know,
link |
01:05:42.800
maybe you can correct me, but I feel like CAPTCHAs don't show up as often in papers as they probably
link |
01:05:47.840
should. That's correct. Yeah. Because, you know, um, usually these things have a momentum, uh,
link |
01:05:53.600
you know, once, once, uh, something gets established as a standard benchmark.
link |
01:05:58.880
Yeah. That is a, there is a, uh, there is a dynamics of, uh, how graduate students operate
link |
01:06:04.560
and how the academic system works that, uh, pushes people to track that, uh, benchmark.
link |
01:06:10.720
Yeah. To folk.
link |
01:06:12.000
Yeah. So nobody wants to think outside the box. Okay.
link |
01:06:16.640
Okay. So good performance on the CAPTCHAs. What else is there interesting, um,
link |
01:06:22.560
on the RCN side before we talk about the cortical microspeed?
link |
01:06:25.520
Yeah. So the, the same model, so the, the, the important part of the model was that it
link |
01:06:31.120
trains very quickly with very little training data and it's, uh, you know, quite robust to
link |
01:06:36.560
out of distribution, uh, perturbations. Um, and, uh, and we are using that, uh, very, uh, fruitfully
link |
01:06:44.240
in, uh, advocatious in many of the robotics tasks we are solving.
link |
01:06:48.000
So, you know, let me ask you this kind of touchy question. I have to, I've spoken with, uh, your
link |
01:06:55.040
friend, colleague, Jeff Hawkins, too. I mean, he's, uh, I have to kind of ask, there is a bit,
link |
01:07:01.920
whenever you have brain inspired stuff and you make big claims, uh, big sexy claims,
link |
01:07:08.400
there's a, you know, uh, there's critics, I mean, machine learning subreddit.
link |
01:07:14.480
Don't get me started on those people. Uh, they're hard. I mean, criticism is good,
link |
01:07:19.120
but they're a bit, uh, they're a bit over the top. Um, there is quite a bit of sort of skepticism
link |
01:07:25.440
and criticism. You know, is this work really as good as it promises to be? Yeah. Do you have thoughts
link |
01:07:32.000
on that kind of skepticism? Do you have comments on the kind of criticism I might have received,
link |
01:07:37.840
uh, about, you know, is this approach legit? Is this, is this a promising approach? Yeah.
link |
01:07:44.480
Or at least as promising as it seems to be, you know, advertised as? Yeah, I can comment on it.
link |
01:07:50.960
Um, so, you know, our, uh, our scene paper is, uh, published in science, which I would argue is,
link |
01:07:56.560
is a very high quality journal, very hard to, uh, publish in and use, you know, usually it is
link |
01:08:02.160
indicative of the, of the quality of the work. And, um, uh, I can, I can, I am very, very certain that
link |
01:08:10.160
the ideas that we brought together in that paper, uh, in terms of the importance of feedback connections,
link |
01:08:15.120
uh, recursive inference, lateral connections, uh, coming to best explanation of the scene as the
link |
01:08:21.440
problem to solve, trying to solve, uh, recognition, segmentation, uh, all jointly in a way that is
link |
01:08:28.720
compatible with higher level cognition, top down attention, all those ideas that we brought
link |
01:08:32.880
together into something, you know, coherent and workable in the, uh, in the world and solving
link |
01:08:36.960
and challenging, tackling a challenging problem. I think that will, that will stay and that, that
link |
01:08:41.920
contribution I stand by, right? Now, uh, I can, I can tell you a story, uh, which is funny in the,
link |
01:08:48.560
in the context of this, right? Um, so if you read the abstract of the paper and like, you know,
link |
01:08:52.640
the argument we are putting in, you know, we are putting in, look, current deep learning systems
link |
01:08:56.720
take a lot of training data. Uh, they don't use these insights and here is our new model,
link |
01:09:02.400
which is not a deep neural network. It's a graphical model. It does inference. This is
link |
01:09:05.680
what, how the paper is, right? Now, once the paper was accepted and everything, um, it went
link |
01:09:11.200
to the press department in, in science, you know, to play as science office. We, we didn't do any
link |
01:09:15.840
press release when it was published. It was, it went to the press department. What did the,
link |
01:09:19.600
what was the press release that they wrote up? A new deep learning model.
link |
01:09:24.880
Solves captchas. Solves captchas. And, uh, so, so you can see where was, you know, what was being
link |
01:09:30.720
hyped in that, uh, thing, right? So there is the, there is a dynamic in the, uh, in the community
link |
01:09:38.400
of, you know, so, uh, um, that's especially happens when there are lots of new people
link |
01:09:43.840
coming into the field and they get attracted to one thing and some people are trying to think
link |
01:09:48.480
different compared to that. So there is, there is some, uh, I think skepticism is science is
link |
01:09:54.000
important and it is, um, you know, very much, uh, required, but it's also, it's not, uh, skepticism
link |
01:10:01.120
usually it's mostly bandwagon effect that is happening rather than in, well, well, but that's
link |
01:10:06.080
not even that. I mean, I'll tell you what they react to, which is like, uh, I'm sensitive to as
link |
01:10:11.840
well. If you, if you look at just companies open AI deep mind, um, vicarious, I mean, they just,
link |
01:10:18.080
there's, uh, there's a little bit of a race to the top and hype, right? It's, it's like,
link |
01:10:27.120
it doesn't pay off to be humble. So like, uh, and, and the press is just, uh, irresponsible
link |
01:10:36.480
often they, they just, I mean, don't get me started on the state of journalism today.
link |
01:10:41.200
Like it seems like the people who write articles about these things, they literally have not even
link |
01:10:46.480
spent an hour on the Wikipedia article about what is neural networks. Like they haven't like
link |
01:10:52.480
invested just even the language to laziness. It's like, uh, robots beat humans. Like they,
link |
01:11:03.280
they write this kind of stuff that just, uh, and then, and then of course the researchers are
link |
01:11:08.480
quite sensitive to that, uh, because it gets a lot of attention. They're like, why did this
link |
01:11:13.040
work get so much attention? Uh, you know, that's, that's over the top and people get really sensitive,
link |
01:11:18.880
you know, the same kind of criticism with, um, opening. I did work with the Rubik's cube with
link |
01:11:23.920
the robot that people criticized, uh, same with GPT two and three. They criticize, uh, same thing
link |
01:11:30.640
with, uh, deep minds with alpha zero. I mean, yeah, I, I'm sensitive to it. Um, but, and of
link |
01:11:38.240
course with your work, it mentioned deep learning, but there's something super sexy to the public
link |
01:11:43.360
about brain inspired. I mean, that immediately grabs people's imagination, not even like
link |
01:11:50.000
neural networks, but like really brain inspired. Like brain, like brain like neural networks,
link |
01:11:57.360
that seems really compelling to people. And, um, to me as well, to, to the world as a narrative.
link |
01:12:03.360
And so, uh, people hook up, hook on to that. And, uh, sometimes you, uh, the skepticism engine turns
link |
01:12:11.920
on in the research community and they're skeptical, but I think putting aside the ideas of the actual
link |
01:12:19.520
performance and captures or performance, any data set, I mean, to me, all these data sets are
link |
01:12:25.760
useless anyway. It's nice to have them. Uh, but in the grand scheme of things, they're silly toy
link |
01:12:31.040
examples. The point is, is there intuition about the, the idea is just like you mentioned,
link |
01:12:38.480
bringing the ideas together in the unique way. Is there something there? Is there some value
link |
01:12:43.600
there? And is it going to stand the test of time? Yes. And that's the hope. That's the hope.
link |
01:12:47.680
Yes. Uh, I'm, my confidence there is very high. I, you know, I don't treat brain inspired as a
link |
01:12:54.080
marketing term. Uh, you know, I am looking into the details of biology and, and puzzling over,
link |
01:13:02.560
uh, those things. And I am, I am grappling with those things. And so it is, it is not a marketing
link |
01:13:07.840
term at all. It, you know, you can use it as a marketing term and, and people often use it.
link |
01:13:12.160
And you can get combined with them. And when, when people don't understand how we are approaching
link |
01:13:17.360
the problem, it is, it is easy to be, uh, misunderstood and, you know, think of it as,
link |
01:13:22.640
you know, purely, uh, marketing, but that's not the way, uh, we are. So you really,
link |
01:13:28.000
I mean, as a scientist, you believe that if we kind of just stick to really understanding the brain,
link |
01:13:35.280
that's going to, that's the right, like you, you should constantly meditate on the,
link |
01:13:40.720
how does the brain do this? Because that's going to be really helpful for engineering
link |
01:13:44.960
intelligent systems. Yes. You need to, so I think it is, it's one input and it is, it is helpful,
link |
01:13:51.280
but you, you should know when to deviate from it too. Um, so an example is convolutional neural
link |
01:13:58.480
networks, right? Uh, convolution is not an operation brain in, uh, implements. The visual
link |
01:14:05.360
cortex is not convolutional. Visual cortex has local receptive fields, local connectivity,
link |
01:14:11.280
but they, you know, the, um, there is no translation in, in variants in the, um,
link |
01:14:18.080
uh, the network weights, um, in, in the visual cortex, that is a, uh, computational
link |
01:14:25.200
trick, which is a very good engineering trick that we use for sharing the training between
link |
01:14:29.920
the different, uh, nodes. Um, so, uh, and, and that trick will be with us for some time. It will
link |
01:14:35.760
go away when we have, um, uh, uh, robots with eyes and heads that move. Uh, and so then the,
link |
01:14:44.400
that trick will go away. It will not be, uh, useful at that time. So,
link |
01:14:48.320
uh, so the brain doesn't, so the brain doesn't have translational invariance. It has the focal
link |
01:14:53.920
point. Like it has a thing it focuses on. Correct. It has, it has a phobia and, and because of the
link |
01:14:58.640
phobia, um, the, the receptive fields are not like the copying of the weights. Like the, the,
link |
01:15:04.800
the weights in the center are very different from the weights in the periphery.
link |
01:15:07.680
Yes. At the periphery. I mean, I did this, uh, actually wrote a paper and just gotten a chance
link |
01:15:13.600
to really study peripheral peripheral vision, which is a fascinating thing. Very under understood
link |
01:15:21.840
thing of what the brain, you know, at every level the brain does with the periphery. It does some
link |
01:15:28.480
funky stuff. Yeah. So it's, uh, it's another kind of trick than, uh, convolutional. Like it does,
link |
01:15:35.280
it, uh, it's a, you know, convolution, your convolution in neural networks is a trick to,
link |
01:15:42.720
for efficiency is efficiency trick. And the brain does a whole nother kind of thing.
link |
01:15:47.040
Correct. Correct. So, so you need to understand the principles or processing so that you can
link |
01:15:53.200
still apply engineering tricks when, where you want it to be. You don't want to be slavishly
link |
01:15:57.680
mimicking all the things of the brain. Um, and, and so yeah, so it should be one input. And I
link |
01:16:02.720
think it is extremely helpful. Uh, but you, it should be the point of really understanding so
link |
01:16:08.400
that you know when to deviate from it. So, okay. That's really cool. That's worked from a few years
link |
01:16:14.800
ago. So you, uh, you did work in New Menta with Jeff Hawkins. Yeah. Uh, with, uh, hierarchical
link |
01:16:22.000
temporal memory. How is your just, if you could give a brief history, how is your view of the way
link |
01:16:30.000
the models of the brain changed over the past few years leading up to, to now? Is there some
link |
01:16:36.160
interesting aspects where there was an adjustment to your understanding of the brain or is it all
link |
01:16:42.000
just building on top of each other? In terms of the higher level ideas, uh, especially the ones
link |
01:16:47.760
Jeff wrote about in the book, if you, if you blur out, right, you know, on intelligence, right,
link |
01:16:52.400
on intelligence, if you, if you blur out the details and, and if you just zoom out and the
link |
01:16:56.720
higher level idea, uh, things are, I would say consistent with what he wrote about, but,
link |
01:17:02.560
but many things will be consistent with that because it is, it's a blur, you know, when you,
link |
01:17:05.680
when you, you know, deep learning systems are also, you know, multilevel hierarchical, all of
link |
01:17:10.960
those things, right? So, so at the, but, um, in terms of the detail, a lot of things are different,
link |
01:17:18.080
uh, and, and, and those details matter a lot. Um, so, so one point of difference I had with Jeff,
link |
01:17:25.680
uh, uh, was, uh, how to approach, you know, how much of biological possibility and realism
link |
01:17:33.600
do you want in the learning algorithms? Um, so, uh, when I was there, uh, this was, you know,
link |
01:17:40.880
almost 10 years ago now. So, yeah, I don't know, I don't know what Jeff thinks now, but 10 years
link |
01:17:47.280
ago, uh, the difference was that I did not want to be so constrained on saying, uh, my learning
link |
01:17:55.200
algorithms want to need to be biologically plausible, um, based on some filter of biological
link |
01:18:01.040
possibility available at that time. To me, that is a dangerous cut to make because we are, you know,
link |
01:18:07.920
discovering more and more things about the brain all the time, new biophysical mechanisms, new
link |
01:18:12.320
channels, uh, are being discovered all the time. So I don't want to upfront kill off and, uh, a
link |
01:18:18.880
learning algorithm just because we don't really understand the fold, uh, the full, uh, the biophysics
link |
01:18:25.920
or whatever of how the brain learns. Exactly. Exactly. But let me ask a search and drop,
link |
01:18:30.800
like what's our, what's your sense? What's our best understanding of how the brain learns?
link |
01:18:36.560
So things like back propagation, credit assignment. So, so many of these algorithms
link |
01:18:42.640
have learning algorithms have things in common, right? It is a back propagation is one way of
link |
01:18:48.080
credit assignment. There is another algorithm called expectation maximization, which is,
link |
01:18:53.040
you know, another weight adjustment algorithm. But is it your sense the brain does something
link |
01:18:58.000
like this? Has to. There is no way around it in the sense of saying that you do have to adjust the
link |
01:19:05.280
the connections. So yeah. And you're saying credit assignment, you have to reward the
link |
01:19:08.720
connections that were useful in making a correct prediction and not, yeah, I guess,
link |
01:19:13.200
brought up, but yeah, it doesn't have to be differentiable. I mean, it doesn't have to be
link |
01:19:18.240
differentiable. Yeah. But you have to have a, you know, you have a model that you start with,
link |
01:19:24.240
you have data comes in and you have to have a way of adjusting the model such that it better
link |
01:19:30.880
fits the data. Yeah. So that, that is all of learning, right? And some of them can be using
link |
01:19:36.320
backprop to do that. Some of it can be using, you know, very local graph changes to do that.
link |
01:19:45.440
There can be, you know, many of these learning algorithms have similar update properties locally
link |
01:19:53.920
in terms of what the neurons need to do locally. I wonder if small differences in learning
link |
01:19:59.280
algorithms can have huge differences in the actual effect. So the dynamics of,
link |
01:20:03.520
I mean, sort of the reverse like spiking, like if credit assignment is like a lightning versus
link |
01:20:13.520
like a rainstorm or something, like whether, whether there's like a looping local type of
link |
01:20:22.320
situation with the credit assignment, whether there is like regularization, like how,
link |
01:20:29.920
how, how it injects robustness into the whole thing, like whether it's chemical or electrical
link |
01:20:38.640
or mechanical. Yeah. All those kinds of things. Yes. I feel like it, that,
link |
01:20:45.760
yeah, I feel like those differences could be essential, right? It could be. It's just that
link |
01:20:51.040
you don't know enough to, on the learning side, you don't know enough to say that is definitely
link |
01:20:59.200
not the way the brain does it. Got it. So you don't want to be stuck to it. So that, yeah. So
link |
01:21:04.400
you've been open minded on that side of things. On the inference side, on the recognition side,
link |
01:21:09.120
I am much more amenable to being constrained because it's much easier to do experiments
link |
01:21:14.800
because, you know, it's like, okay, here's the stimulus. You know, how many steps did it get
link |
01:21:18.480
to take the answer? I can trace it back. I can, I can understand the speed of that computation,
link |
01:21:24.320
et cetera, much more readily on the inference side. Got it. And then you can't do good experiments
link |
01:21:29.920
on the learning side. Correct. So that, let's, let's go right into the cortical microcircuits
link |
01:21:36.640
right back. So what, what are these ideas beyond recursive cortical network that you're looking
link |
01:21:44.160
at now? So we have made a, you know, pass through multiple of the steps that, you know, as I mentioned
link |
01:21:52.080
earlier, you know, we were looking at perception from the angle of cognition, right? It was not
link |
01:21:56.720
just perception for perception sake. How do you, how do you connect it to cognition? How do you
link |
01:22:01.920
learn concepts? And how do you learn abstract reasoning? Similar to some of the things Francois
link |
01:22:08.720
talked about, right? So, so we have taken one pass through it, basically saying,
link |
01:22:16.320
what is the basic cognitive architecture that you need to have, which has a perceptual system,
link |
01:22:22.880
which has a system that learns dynamics of the world, and then has something like a routine
link |
01:22:29.840
program learning system on top of it to learn concepts. So we have, we have built one, the,
link |
01:22:35.440
you know, the version point one of that system. This was another science robotics paper. It is,
link |
01:22:41.680
it's the title of that paper was, you know, something like cognitive programs. How do you build
link |
01:22:46.400
cognitive programs? And, and the application there was on manipulation, robotic manipulation?
link |
01:22:53.440
It was, it was, so think of it like this. Suppose you wanted to tell a new person
link |
01:23:01.200
that you met, you don't know the language, or that person uses, you want to communicate to that
link |
01:23:05.840
person to achieve some task, right? So I want to say, hey, you need to pick up all the red
link |
01:23:13.840
cups from the kitchen counter, and put it here, right? How do you communicate that, right? You
link |
01:23:19.360
can show pictures, you can basically say, look, this is the starting state, the things are here,
link |
01:23:25.600
this is the ending state. And, and what does the person need to understand from that, the person
link |
01:23:30.400
need to understand what conceptually happened in those pictures from the input to the output,
link |
01:23:34.960
right? So, so we are looking at preverbal conceptual understanding without language.
link |
01:23:42.480
How do you, how do you have a set of concepts that you can manipulate in your head? And from a
link |
01:23:50.000
in a set of images of input and output, can you infer what is happening in those images?
link |
01:23:56.800
Got it. With concepts that are pre language. Okay. So what does it mean for concept to be
link |
01:24:01.360
pre language? Like, yeah, why so why is language so important here? So I want to make a distinction
link |
01:24:12.080
between concepts that are just learned from text by just just feeding brute force text.
link |
01:24:20.560
You can, you can start extracting things like, okay, cow is likely to be on grass.
link |
01:24:26.880
So those kinds of things, you can extract purely from text. But that's kind of a simple
link |
01:24:34.640
association thing rather than a concept as an abstraction of something that happens in the
link |
01:24:40.000
real world, you know, in a grounded way, that I can, I can simulate it in my mind and connect it
link |
01:24:46.800
back to the real world. And you think kind of the visual, the visual world concepts in the visual
link |
01:24:53.360
world are somehow lower level than just the language. The lower level kind of makes it feel
link |
01:25:00.720
like, okay, that's like an unimportant like it's more like, I would say the concepts in the visual
link |
01:25:10.160
and the motor system and, you know, the concept learning system, which if you cut off the language
link |
01:25:17.200
part, just the just what we learn by interacting with the world and abstractions from that,
link |
01:25:21.920
that is a prerequisite for any real language understanding.
link |
01:25:26.480
So you're, so you disagree with Chomsky, because he says language is at the bottom of everything.
link |
01:25:32.080
No, I, I, yeah, I disagree with Chomsky completely from universal grammar to, yeah.
link |
01:25:39.760
So that was a paper in science beyond the recursive cortical network.
link |
01:25:44.080
What, what other interesting problems are there, the open problems and brain inspired
link |
01:25:49.120
approaches that you're thinking about? I mean, everything is over, right? Like,
link |
01:25:53.760
you know, no, no, no problem is solved, solved, right? I think of perception as kind of the
link |
01:26:01.600
first thing that you have to build, but the last thing that you will be actually solved.
link |
01:26:09.680
So, because if you do not build perception system in the right way, you cannot build
link |
01:26:15.040
concept system in the right way. So, so you have to build a perception system. However,
link |
01:26:19.920
wrong that might be, you have to still build that and learn concepts from there and then,
link |
01:26:24.640
you know, keep it trading. And, and finally, perception will get solved fully when perception,
link |
01:26:30.480
cognition, language, all those things work together. Finally.
link |
01:26:33.920
So what, and so great, we've talked a lot about perception, but then maybe on the concept side
link |
01:26:40.160
and like common sense, or just general reasoning side, is there some, some intuition you can
link |
01:26:47.280
draw from the brain about how we could do that? So I have, I have this classic example I give.
link |
01:26:55.440
So suppose I give you a few sentences, and then ask you a question following that sentence,
link |
01:27:01.120
this is a natural language processing problem, right? So here goes. I'm telling you,
link |
01:27:06.720
Sally pounded a nail on the ceiling. Okay. That's a sentence. Now I'm asking you a question.
link |
01:27:14.560
What's the nail horizontal or vertical? Vertical. Okay. How did you answer that?
link |
01:27:22.240
Well, I imagined Sally, it was kind of hard to imagine what the hell she was doing, but
link |
01:27:29.760
but I imagined I had a visual of the whole situation. Exactly. Exactly. So, so, so here,
link |
01:27:36.160
you know, I, I posed a question in natural language. The answer to that question was you,
link |
01:27:41.360
you got the answer from actually simulating the scene. Now I can go more and more detail about,
link |
01:27:46.880
okay, was Sally standing on something while doing this, you know, could, could she have been
link |
01:27:52.480
standing on a light bulb to do this? You know, I could, I could ask more and more questions
link |
01:27:56.960
about this. And I can ask, make you simulate the scene in, seen in more and more detail, right?
link |
01:28:01.680
Where is all that knowledge that you're accessing stored? It is not in your language system. It is
link |
01:28:08.560
not, it was not just by reading text, you got that knowledge. It is stored from the everyday
link |
01:28:14.720
experiences that you have had from and by the, by the age of five, you, you have pretty much all
link |
01:28:20.560
of this, right? And it is stored in your visual system, motor system in a way such that it can
link |
01:28:26.640
be accessed through language. I got it. I mean, right. So here, the language is just almost
link |
01:28:33.440
serves the query into the whole visual cortex and that does the whole feedback thing. But I mean,
link |
01:28:38.160
it is all reasoning kind of connected to the perception system in some way. You can do a lot
link |
01:28:45.680
of it, you know, you can still do a lot of it by quick associations without having to go into the
link |
01:28:52.400
depth. And most of the time, you will be right, right? You can just do quick associations, but
link |
01:28:57.920
I can easily create tricky situations for you where that quick associations is wrong,
link |
01:29:02.160
and you have to actually run the simulation. So the figuring out the, how these concepts connect.
link |
01:29:09.600
Do you have a good idea of how to do that? That's exactly what that's the one of the problems
link |
01:29:14.720
that we are working on. And, and, and the, the way we are approaching that is basically saying,
link |
01:29:19.920
okay, you need to, so the, the takeaway is that language is simulation control. And your perceptual
link |
01:29:28.960
plus motor system is building a simulation of the world. And so, so that's basically the way
link |
01:29:36.960
we are approaching it. And the first thing that we built was a controllable perceptual system.
link |
01:29:42.080
And we built a schema networks, which was a controllable dynamic system. Then we built a
link |
01:29:47.440
concept learning system that puts all these things together into programs as abstractions that you
link |
01:29:53.120
can run and simulate. And now we are taking the step of connecting it to language. And, and
link |
01:30:00.240
it will be very simple examples initially, it will not be the GPT three like examples,
link |
01:30:04.880
but it will be grounded simulation based language. And for like the, the querying would be like question
link |
01:30:12.640
answering kind of thing. And it will be in some simple world initially on, you know, but it will
link |
01:30:19.760
be about, okay, can the system connect the language and ground it in the right way and run the right
link |
01:30:25.840
simulations to come up with the answer. And the goal is to try to do things that, for example,
link |
01:30:30.160
GPT three couldn't do. Speaking of which, if we could talk about GPT three a little bit, I think
link |
01:30:39.360
it's an interesting thought provoking set of ideas that open the eyes pushing forward. I think it's
link |
01:30:45.760
good for us to talk about the limits and the possibilities and you know, that work. So in
link |
01:30:50.800
general, what are your thoughts about this recently released very large 175 billion parameter
link |
01:30:58.000
language model? So I have, I haven't directly evaluated it yet from what I have seen on Twitter
link |
01:31:04.240
and you know, other people evaluating it, it looks very intriguing. You know, I am very intrigued by
link |
01:31:09.600
some of the properties it is displaying. And of course, the text generation part of that was
link |
01:31:16.960
already evident in GPT two, you know, that it can generate coherent text over long distances.
link |
01:31:23.840
That was, but of course, the weaknesses are also pretty visible in saying that, okay,
link |
01:31:29.680
it is not really carrying a world state around. And, you know, sometimes you get sentences like,
link |
01:31:36.080
I went up the hill to reach the valley or the thing that some, you know, completely
link |
01:31:41.360
incompatible statements or when you're traveling from one place to the other, it doesn't take
link |
01:31:46.560
into account the time of travel, things like that. So those things, I think will happen less than
link |
01:31:51.600
GPT three, because it is trained on even more data. And so, and it has, it can do even more
link |
01:31:58.320
longer distance coherence. But it will still have the fundamental limitations that it doesn't
link |
01:32:05.600
have a world model. And it can't run simulations in its head to find whether something is true
link |
01:32:12.000
in the world or not. Do you think within, so it's taking a huge amount of text from the internet
link |
01:32:17.680
and forming a compressed representation, do you think in that could emerge something that's an
link |
01:32:24.800
approximation of a world model, which essentially could be used for reasoning? I'm not talking
link |
01:32:33.200
about GPT three, I'm talking about GPT four, five and GPT 10. Yeah, I mean, they will look
link |
01:32:38.800
more impressive than GPT three. So you can, if you take that to the extreme, then a Markov chain
link |
01:32:45.360
of just first order, and if you go to, I'm taking the other extreme, if you read Shannon's book,
link |
01:32:54.080
right? He has a model of English text, which is based on first order Markov chains,
link |
01:33:00.320
second order Markov chains, third order Markov chains and saying that, okay, third order Markov
link |
01:33:03.760
chains look better than first order Markov chains. So does that mean a first order Markov chain has
link |
01:33:10.640
a model of the world? Yes, it does. So yes, in that level, when you grow higher order models,
link |
01:33:19.120
or more sophisticated structure in the model like the transformer networks have, yes, they have a
link |
01:33:25.200
model of the text world. But that is not a model of the world. It's a model of the text world and
link |
01:33:33.440
it will have interesting properties and it will be useful. But just scaling it up is not going to
link |
01:33:42.240
give us AGI or natural language understanding or meaning. The question is whether being forced
link |
01:33:53.360
to compress a very large amount of text forces you to construct things that are very much like,
link |
01:34:02.960
because the ideas of concepts and meaning is a spectrum. Sure. So in order to form that kind
link |
01:34:11.520
of compression, maybe it will be forced to figure out abstractions which look awfully a lot like
link |
01:34:23.120
the kind of things that we think about as concepts, as world models, as common sense. Is that possible?
link |
01:34:31.120
No, I don't think it is possible because the information is not there.
link |
01:34:34.000
Well, the information is there behind the text, right?
link |
01:34:38.640
No, unless somebody has written down all the details about how everything works in the world
link |
01:34:44.400
to the absurd amounts like, okay, it is easier to walk forward than backward,
link |
01:34:50.240
that you have to open the door to go out of the thing, doctors wear underwear,
link |
01:34:55.200
unless all these things somebody has written down somewhere or somehow the program found
link |
01:34:59.520
it to be useful for compression from some other text, the information is not there.
link |
01:35:04.480
So that is an argument that text is a lot lower fidelity than the experience of our physical world.
link |
01:35:13.040
Correct. It is worth a thousand words.
link |
01:35:17.440
Well, in this case, pictures are not really, so the richest aspect of the physical world
link |
01:35:23.600
is not even just pictures, it is the interactivity of the world.
link |
01:35:28.320
Exactly. Yeah, it is being able to interact. It is almost like if you could interact.
link |
01:35:40.080
I disagree. Well, maybe I agree with you that pictures were a thousand words, but a thousand...
link |
01:35:45.840
You could say you could capture it with the GPTX.
link |
01:35:49.760
So I wonder if there is some interactive element where a system could live in text world,
link |
01:35:54.080
where it could be part of the chat, be part of talking to people.
link |
01:36:00.640
It is interesting. Fundamentally, you are making a statement about the limitation of text.
link |
01:36:07.520
Okay, so let us say we have a text corpus that includes basically every experience we could
link |
01:36:17.360
possibly have. I mean, just a very large corpus of text and also interactive components.
link |
01:36:23.120
I guess the question is whether the neural network architecture, these very simple transformers,
link |
01:36:28.560
but if they had hundreds of trillions or whatever comes after a trillion parameters,
link |
01:36:36.320
whether that could store the information needed. That is architecturally. Do you have thoughts
link |
01:36:45.120
about the limitation on that side of things with neural networks?
link |
01:36:48.880
I mean, so transformer is still a feed forward neural network. It has a very interesting
link |
01:36:56.480
architecture, which is good for text modeling and probably some aspects of video modeling,
link |
01:37:01.520
but it is still a feed forward architecture. Do you believe in the feedback mechanism, the
link |
01:37:06.240
recursion? Oh, and also causality, being able to do counterfactual reasoning, being able to do
link |
01:37:14.160
interventions, which is actions in the world. So all those things require different kinds of
link |
01:37:22.240
models to be built. I don't think a transformer captures that family. It is very good at statistical
link |
01:37:30.720
modeling of text. And it will become better and better with more data, bigger models.
link |
01:37:37.520
But that is only going to get so far. Finally, when you... So I had this joke on Twitter saying
link |
01:37:44.880
that, hey, this is a model that has read all of quantum mechanics and theory of relativity,
link |
01:37:52.080
and we are asking you to do text completion, or we are asking you to solve simple puzzles.
link |
01:37:58.960
When you have AGI, that's not what you ask a system to do. If it has...
link |
01:38:03.200
We'll ask the system to do experiments and come up with hypothesis and revise the hypothesis
link |
01:38:10.880
based on evidence from experiments, all those things. Those are the things that we want the
link |
01:38:14.720
system to do when we have AGI, not solve simple puzzles. Like impressive demo,
link |
01:38:21.440
somebody generating a red button in HTML, which are all useful. There's no dissing the usefulness.
link |
01:38:29.440
So I get... By the way, I'm playing a little bit of a devil's advocate. So calm down,
link |
01:38:35.680
internet. So I'm curious almost in which ways will a dumb but large neural network will surprise us.
link |
01:38:48.480
So it's kind of your... I completely agree with your intuition. It's just that I don't want to
link |
01:38:54.960
dogmatically, 100% put all the chips there. We've been surprised so much. Even the current
link |
01:39:04.320
GPT2 and 3 are so surprising. The self play mechanisms of AlphaZero are really surprising.
link |
01:39:18.160
The fact that reinforcement learning works at all to me is really surprising. The fact that
link |
01:39:22.320
neural networks work at all is quite surprising. Given how nonlinear the space is, the fact that
link |
01:39:28.080
it's able to find local minima that are all reasonable, it's very surprising. So I wonder
link |
01:39:36.080
sometimes whether us humans just want it to not... For AGI not to be such a dumb thing.
link |
01:39:46.480
Because exactly what you're saying is like the ideas of concepts and be able to reason with
link |
01:39:54.880
those concepts and connect those concepts in like hierarchical ways and then to be able to have
link |
01:40:02.480
world models. Just everything we're describing in human language in this poetic way seems to
link |
01:40:08.720
make sense that that is what intelligence and reasoning are like. I wonder if at the core of
link |
01:40:13.120
it it could be much dumber. Well, finally it is still connections and messages passing over them.
link |
01:40:23.600
So I guess the recursion, the feedback mechanism, that does seem to be a fundamental kind of thing.
link |
01:40:31.280
Yeah, yeah. The idea of concepts, also memory. Correct. Having an episodic memory.
link |
01:40:38.560
Yeah. That seems to be an important thing. So how do we get memory? So yeah, we have another
link |
01:40:44.320
piece of work that which came out recently on how do you form episodic memories and form
link |
01:40:50.640
abstractions from them. And we haven't figured out all the connections of that to the overall
link |
01:40:56.240
cognitive architecture. But yeah, what are your ideas about how you could have episodic memory?
link |
01:41:03.040
So at least it's very clear that you need to have two kinds of memory. That's very, very clear.
link |
01:41:08.720
Which is there are things that happen as statistical patterns in the world. But then there is the
link |
01:41:17.840
one timeline of things that happen only once in your life. And this day is not going to happen
link |
01:41:23.840
ever again. And so, and that needs to be stored as a, you know, just a stream of string. This is my
link |
01:41:32.560
experience. And then the question is about how do you take that experience and connect it to the
link |
01:41:38.960
statistical part of it? How do you now say that, okay, I experienced this thing. Now I want to
link |
01:41:45.200
be careful about similar situations. And so you need to be able to index that similarity
link |
01:41:52.560
using your other giants, that is, you know, the model of the world that you have learned. Although
link |
01:41:58.880
the situation came from the episode, you need to be able to index the other one. So
link |
01:42:04.960
the episodic memory being implemented as an indexing over the other model that you're building.
link |
01:42:13.440
So the memories remain and they, they, they're an index into this, like the statistical thing that
link |
01:42:22.800
you formed. Yeah, statistical or causal structural model that you built over time. So,
link |
01:42:28.560
so it's basically the idea is that the hippocampus is just storing or sequencing
link |
01:42:35.680
a, a, in a set of pointers that happens over time. And then whenever you want to reconstitute that
link |
01:42:43.920
memory and evaluate the different aspects of it, whether it was good, bad, do I need to encounter
link |
01:42:50.560
the situation again, you need the cortex to reinstant it to replay that memory.
link |
01:42:57.840
So how do you find that memory? Like, which direction is the important direction?
link |
01:43:01.760
Both directions are, you know, it's again bi directional. I mean, I guess how do you retrieve
link |
01:43:07.440
the memory? So this is again hypothesis, right? We're making this up. So when you, when you come
link |
01:43:12.480
to a new situation, right, your, your cortex is doing inference over in the new situation. And
link |
01:43:19.920
then, of course, hippocampus is connected to different parts of the cortex. And, and you have
link |
01:43:25.840
this deja vu situation, right? Okay, I have seen this thing before. And, and then in the hippocampus,
link |
01:43:35.280
you can have an index of, okay, this is when it happened as a timeline. And, and, and then,
link |
01:43:42.640
then you can use the hippocampus to drive the, the similar timelines to say, now I am, I am,
link |
01:43:48.960
rather than being driven by my current input stimuli, I am going back in time and rewinding
link |
01:43:55.840
my experience from replaying it, but putting back into the cortex and then putting it back
link |
01:44:00.240
into the cortex, of course affects what you're going to see next in your current situation.
link |
01:44:04.960
Got it. Yeah. So that's, that's the whole thing, having a world model and then yeah,
link |
01:44:10.160
connecting to the perception. Yeah, it does seem to be that that's what's happening to be
link |
01:44:14.560
on the neural network side. It's, it's interesting to think of how we actually do that.
link |
01:44:21.600
Yeah. Yeah. So have a knowledge base. Yes. It is possible that you can put many of these structures
link |
01:44:28.160
into neural networks and we will find ways of combining properties of neural networks and
link |
01:44:36.880
graphical models. So, I mean, it's already started happening. Yes. Graph neural networks are kind
link |
01:44:42.720
of a merge between them. And there will be more of that thing. So, but to me, it is the direction
link |
01:44:49.040
is pretty clearly in looking at biology and the history of evolutionary history of intelligence.
link |
01:44:57.600
It is pretty clear that, okay, what needs is more structure in the models and modeling of the world
link |
01:45:05.520
and supporting dynamic inference. Well, let me ask you. There's a guy named Elon Musk. There's a
link |
01:45:12.960
company called Neuralink and there's a general field called brain computer interfaces. Yeah.
link |
01:45:18.080
It's kind of a interface between your two loves. Yes. The brain and the intelligence. So there's
link |
01:45:26.080
like very direct applications of brain computer interfaces for people with different conditions,
link |
01:45:32.080
more in the short term. Yeah. But there's also these sci fi futuristic kinds of ideas of AI systems
link |
01:45:38.800
being able to communicate in a high bandwidth way with the brain by directional. Yeah. What are
link |
01:45:46.720
your thoughts about Neuralink and BCI in general as a possibility? So I think BCI is a cool research
link |
01:45:55.920
area. And in fact, when I got interested in brains initially, when I was enrolled at Stanford,
link |
01:46:03.120
and when I got interested in brains, it was through a brain computer interface talk
link |
01:46:08.880
that Krishna Shannoi gave. That's when I even started thinking about the problem. So it is
link |
01:46:14.800
definitely a fascinating research area. And it is the applications are enormous. So there's
link |
01:46:21.120
the science fiction scenario of brains directly communicating. Let's keep that aside for the
link |
01:46:26.240
time being. Even just the intermediate milestones that pursuing, which are very reasonable as far
link |
01:46:32.480
as I can see, being able to control an external limb using direct connections from the brain
link |
01:46:40.560
and being able to write things into the brain. So those are all good steps to take. And they have
link |
01:46:48.560
enormous applications, people losing limbs, being able to control prosthetics, quadriplegics,
link |
01:46:55.120
being able to control something. So I'm therapeutics. And I also know about another company working in
link |
01:47:01.440
the space called Paradromix. They're based on a different electrode array, but trying to attack
link |
01:47:09.120
some of the same problems. So I think it's a very... Also surgery? Correct, surgically implanted
link |
01:47:14.800
electrons. Yeah. So yeah, I think of it as a very, very promising field, especially when it is
link |
01:47:22.560
helping people overcome some limitations. Now, at some point, of course, it will advance the level
link |
01:47:28.640
of being able to communicate. How hard is that problem, do you think? So okay, let's say we
link |
01:47:35.760
magically solve what I think is a really hard problem of doing all of this safely.
link |
01:47:41.680
Yeah. So being able to connect electrodes and not just thousands, but like millions to the brain.
link |
01:47:50.000
Yeah. I think it's very, very hard because you also do not know what will happen to the brain
link |
01:47:56.800
with that, right? In the sense of how does the brain adapt to something like that?
link |
01:47:59.920
And as we were learning, the brain is quite... In terms of neuroplasticity, it's pretty malleable.
link |
01:48:07.200
Correct. So it's going to adjust. Correct. So the machine learning side, the computer side is going
link |
01:48:12.400
to adjust and then the brain is going to adjust. Exactly. And then what soup does this land us
link |
01:48:16.880
into is... The kind of hallucinations you might get from this. That might be pretty intense.
link |
01:48:22.960
Yeah. Yeah. So just connecting to all of Wikipedia. It's interesting whether we need to be able to
link |
01:48:29.600
figure out the basic protocol of the brain's communication schemes in order to get them to
link |
01:48:36.400
to the machine and the brain to talk. Because another possibility is the brain actually just
link |
01:48:41.760
adjust to whatever the heck the computer is doing. Exactly. That's the way I think that I find that
link |
01:48:46.240
to be a more promising way. It's basically saying, you know, okay, attach electrodes to some part of
link |
01:48:52.080
the cortex. Okay. And maybe if it is done from birth, the brain will adapt. It says that, you
link |
01:48:58.480
know, that part is not damaged. It was not used for anything. These electrodes are attached there,
link |
01:49:02.640
right? And now you train that part of the brain to do this high bandwidth communication between
link |
01:49:09.120
something else, right? And if you do it like that, then it is brain adapting to and of course,
link |
01:49:15.600
your external system is designed such that it is adaptable. You know, just like we, you know,
link |
01:49:20.000
design computers or mouse keyboard, all of them to be interacting with humans. So of course,
link |
01:49:27.920
that feedback system is designed to be human compatible. But now it is not trying to record
link |
01:49:36.080
from the all of the brain and, you know, now, you know, two systems trying to adapt to each other.
link |
01:49:41.680
It's a brain adapting into one way. That's fascinating. The brain is connected to like the
link |
01:49:47.440
internet. It's connected. Just imagine it's connecting it to Twitter and just just taking
link |
01:49:53.360
that stream of information. Yeah. But again, if we take a step back, I don't know what your
link |
01:50:01.200
intuition is. I feel like that is not as hard of a problem as the doing it safely. There's,
link |
01:50:11.280
there's a huge barrier to surgery. Right. That because, because the biological system,
link |
01:50:17.520
it's a mush of like weird stuff. Correct. So that the surgery part of it, biology part of it,
link |
01:50:24.560
the long term repercussions part of it. Again, I don't know what else will, you know, we,
link |
01:50:30.880
we often find after a long time in biology that, okay, that idea was wrong, right? You know, so
link |
01:50:38.080
people used to cut off this, the gland called the thymus or something. And then they found that,
link |
01:50:45.680
oh, no, that actually causes cancer. And then there's a subtle like millions of variables
link |
01:50:53.440
involved. But this whole process, the nice thing, and just like, again, with Elon, just like colonizing
link |
01:51:00.320
Mars seems like a ridiculously difficult idea. But in the process of doing it, we might learn a lot
link |
01:51:07.040
about the biology of the neurobiology of the brain, the neuroscience side of things. It's like,
link |
01:51:12.160
if you want to learn something, do the most difficult version of it and see what you learn.
link |
01:51:18.400
The intermediate steps that they are taking sounded all very reasonable to me.
link |
01:51:22.240
Yeah. It's great. Well, but like everything with Elon is the timeline seems insanely fast. So
link |
01:51:29.680
that's, that's the only awful question. Well, what we've been talking about cognition a little bit.
link |
01:51:36.080
So like reasoning, we haven't mentioned the other C word, which is consciousness. Do you ever think
link |
01:51:42.800
about that one? Is that useful at all in this whole context of what it takes to create an
link |
01:51:50.320
intelligent reasoning being? Or is that completely outside of your, like the engineering perspective
link |
01:51:57.200
of intelligence? So it is not outside the realm, but it doesn't, on a day to day way, you know,
link |
01:52:03.600
basis inform what we do. But it's more, so in many ways, the company name is connected to
link |
01:52:10.640
this idea of consciousness. What's the company name?
link |
01:52:13.680
Vicarious. So Vicarious is the company name. And so what does Vicarious mean? At the first level,
link |
01:52:23.120
it is about modeling the world. And it is internalizing the external actions. So you interact
link |
01:52:30.640
with the world and learn a lot about the world. And now after having learned a lot about the world,
link |
01:52:36.080
you can run those things in your mind without actually having to act in the world. So you can
link |
01:52:43.120
run things vicariously, just in your, in your, in your brain. And similarly, you can experience
link |
01:52:49.520
another person's thoughts by, you know, having a model of how that person works and, and running
link |
01:52:56.080
their, you know, putting yourself in some other person's shoes. So that is being vicarious.
link |
01:53:01.360
Now, it's the same modeling apparatus that you're using to model the external world
link |
01:53:06.800
or some other person's thoughts. You can turn it to yourself. You can up, you know, if that same
link |
01:53:13.200
modeling thing is applied to your own modeling apparatus, then that is what gives rise to
link |
01:53:19.760
consciousness, I think. Well, that's more like self awareness. There's the hard problem of
link |
01:53:24.720
consciousness, which is like, when the model becomes, when the model feels like something,
link |
01:53:32.640
when the whole process is like, it's like, you really are in it. You feel like an entity in
link |
01:53:40.560
this world, not just, you know that you're an entity, but it feels like something to be that
link |
01:53:46.080
entity. It, it, you know, and thereby we attribute this, you know, then it starts to be where in
link |
01:53:55.760
something that has consciousness can suffer, you start to have these kinds of things that we can
link |
01:54:00.480
reason about that is much, much heavier. It seems like there's much greater cost to your,
link |
01:54:09.200
your decisions. And like mortality is tied up into that, like the fact that these things end.
link |
01:54:16.800
Right. First of all, I end at some point, and then other things end. And, you know, that, that
link |
01:54:24.480
somehow seems to be, at least for us humans, a deep motivator. Yes. And that, you know, that,
link |
01:54:32.480
that idea of motivation in general, we talk about goals in AI, but the goals aren't quite the same
link |
01:54:39.840
thing as like the, our mortality. It feels like, it feels like, first of all, humans don't have a
link |
01:54:46.240
goal. And they just kind of create goals at different levels. They like make up goals.
link |
01:54:52.880
Because we're terrified by the mystery of the thing that gets us all. So we make these goals
link |
01:55:01.600
up. So we're like a goal generation machine, as opposed to a machine which optimizes the trajectory
link |
01:55:08.640
towards a singular goal. So it feels like that's an important part of cognition, that whole mortality
link |
01:55:16.720
thing. Well, it is, it is a part of human cognition. But there is no reason for that mortality to come
link |
01:55:27.360
to the equation for a artificial system, because we can copy the artificial system. The problem with
link |
01:55:36.160
humans is that we can't, I can't clone you. I can't, you know, I can, I can, even if I clone
link |
01:55:41.760
you as a, you know, the hardware, your experience that was stored in your brain, your episodic
link |
01:55:48.800
memory, all those will not be captured in the, in the new clone. So, but that's not the same with
link |
01:55:54.960
an AI system, right? So, but it's also possible that the, the thing that you mentioned with us
link |
01:56:02.880
humans is actually fundamental, fundamental importance for intelligence. So like the fact
link |
01:56:07.760
that you can copy an AI system means that that AI system is not yet an AGI. So like,
link |
01:56:16.080
so if you look at existence proof, if we reason, based on existence proof, you could say that it
link |
01:56:22.720
doesn't feel like death is a fundamental property of an intelligence system, but we don't yet,
link |
01:56:30.320
give me an example of an immortal intelligent being. We don't have those. It could, it's very
link |
01:56:36.960
possible that, you know, that's, that is a fundamental property of intelligence is a thing that
link |
01:56:45.520
has a deadline for itself. So you can think of it like this. So suppose you invent a way to
link |
01:56:51.840
freeze people for a long time. It's not dying, right? So, so you can be frozen and woken up
link |
01:56:59.600
thousands of years from now. So it's no fear of death.
link |
01:57:04.560
Well, no, you're still, it's not, it's not about time. It's about the knowledge that it's temporary.
link |
01:57:12.400
And the, that aspect of it, the finiteness of it, I think creates a kind of urgency.
link |
01:57:21.520
Correct. For us, for humans. Yeah, for humans. Yes. And that, that is part of our drives.
link |
01:57:27.920
But, and that's why I'm not too worried about AI, you know, having motivations to kill all humans
link |
01:57:37.920
and those kinds of things. Why just wait, you know? So, why do you need to do that?
link |
01:57:45.440
I've never heard that before. That's a good point. Yeah, just murder seems like a lot of work.
link |
01:57:52.800
Just wait, wait it out. They'll probably hurt themselves. Let me ask you, people often kind
link |
01:58:00.480
of wonder, world class researchers such as yourself, what kind of books, technical fiction,
link |
01:58:09.040
philosophical were, had an impact on you in your life? And maybe ones you could possibly
link |
01:58:17.520
recommend that others read? Maybe if you have three books that pop into mind?
link |
01:58:23.120
Yeah. So I definitely liked Judea Pearl's book, probabilistic reasoning and intelligent systems.
link |
01:58:29.760
It's, it's a very deep technical book. But what I liked is that in, so there are many
link |
01:58:35.280
places where you can learn about probabilistic graphical models from. But throughout this book,
link |
01:58:40.960
Judea Pearl kind of sprinkles his philosophical observations and, and he thinks about connections
link |
01:58:47.200
to how the brain thinks and attentions and resources, all those things. So, so that whole
link |
01:58:52.080
thing makes it more interesting to read. He emphasizes the importance of causality.
link |
01:58:57.840
So that was in his later book. So this was the first book probabilistic reasoning in
link |
01:59:01.680
intelligent systems. He mentions causality, but he hadn't really sunk his teeth into,
link |
01:59:07.280
like, you know, how do you actually formalize that? Yeah. And the second book causality was
link |
01:59:12.800
2000, the one in 2000, that one is really hard. So I wouldn't recommend that.
link |
01:59:17.680
Oh, yeah. So that looks at the, like the mathematical, like his model of
link |
01:59:23.120
do calculus. Yeah, it was pretty dense mathematics. Right. Right. The book of why is
link |
01:59:27.760
definitely more enjoyable. Oh, for sure. Yeah. So, yeah. So I would, I would recommend probabilistic
link |
01:59:32.320
reasoning in intelligent systems. Another book I liked was one from Doug Hofstadter.
link |
01:59:39.120
This is a long time ago. He has a book, he had a book, I think call it was called the mind's eye.
link |
01:59:43.440
It was probably Hofstadter and Daniel Dennett together. Yeah. So, and I actually was,
link |
01:59:51.120
I bought that book so much I haven't read it yet, but I couldn't get an electronic version of it,
link |
01:59:58.240
which is annoying because I read everything on Kindle. Oh, okay. I had to actually purchase
link |
02:00:04.400
the physical. It's like one of the only physical books I have. Yeah. Anyway, there's a lot of
link |
02:00:08.960
people recommended it highly. So, yeah. And the third one I would definitely recommend reading is
link |
02:00:14.480
this is not a technical book. It is history. It's called, it's the name of the book, I think,
link |
02:00:22.240
is Bishop's Boys. It's about Wright Brothers and their path and how it was, there are multiple
link |
02:00:32.720
books on this topic and all of them are great. It's fascinating how flight was treated as an
link |
02:00:42.960
unsolvable problem. And also, what aspects did people emphasize? People thought, oh,
link |
02:00:50.640
it is all about just powerful engines. We just need to have powerful lightweight engines.
link |
02:00:57.920
And so, some people thought of it as, how far can we just throw the thing? Just throw it.
link |
02:01:04.960
Like a catapult. Yeah. So, it's very fascinating. And even after they made the invention,
link |
02:01:13.120
people are not believing it. And the social aspect of it, yeah.
link |
02:01:16.800
The social aspect, because it's very fascinating. Do you draw any parallels between birds fly?
link |
02:01:25.200
So, there's the natural approach to flight and then there's the engineered approach. Do you
link |
02:01:30.800
see the same kind of thing with the brain and are trying to engineer intelligence?
link |
02:01:37.280
Yeah, it's a good analogy to have. Of course, all analogies have their, you know,
link |
02:01:43.600
limits. So, people in AI often use airplanes as an example of, hey, we didn't learn anything
link |
02:01:53.520
from birds. Look, but the funny thing is that, and the saying is, airplanes don't flap wings.
link |
02:02:01.600
This is what they say. The funny thing and the ironic thing is that, that you don't need to flap
link |
02:02:08.000
to fly is something right with those found by observing birds. So, they have in their notebook,
link |
02:02:17.440
in some of these books, they show their notebook drawings. They make detailed
link |
02:02:22.480
notes about buzzards just soaring over thermals. And they basically say, look, flapping is not
link |
02:02:29.600
the important, propulsion is not the important problem to solve here. We want to solve control.
link |
02:02:35.360
And once you solve control, propulsion will fall into place. All of these are people,
link |
02:02:40.400
you know, they relate this by observing birds.
link |
02:02:44.480
Beautifully put. That's actually brilliant. Because people do use that analogy a lot. I'm
link |
02:02:49.280
going to have to remember that one. Do you have advice for people interested in artificial
link |
02:02:54.480
intelligence like young folks today? I talked to undergraduate students all the time,
link |
02:02:59.200
interested in neuroscience, interested in understanding how the brain works. Is there
link |
02:03:03.840
advice you would give them about their career, maybe about their life in general?
link |
02:03:09.520
Sure. I think every, you know, every piece of advice should be taken with a pinch of salt,
link |
02:03:14.720
because, you know, each person is different. Their motivations are different. But I can
link |
02:03:19.200
I can definitely say if your goal is to understand the brain from the angle of wanting to build one,
link |
02:03:26.880
you know, then being an experimental neuroscientist might not be the way to go about it.
link |
02:03:36.800
A better way to pursue it might be through computer science, electrical engineering,
link |
02:03:42.640
machine learning, and AI. And of course, you have to study up the neuroscience,
link |
02:03:46.240
but that you can do on your own. If you're more attracted by finding something intriguing about
link |
02:03:53.920
discovering something intriguing about the brain, then of course, it is better to be an
link |
02:03:58.960
experimentalist. So find that motivation. What are you intrigued by? And of course,
link |
02:04:03.120
find your strengths too. Some people are very good experimentalists, and they enjoy doing that.
link |
02:04:09.440
And it's interesting to see which department, if you're, if you're picking in terms of like
link |
02:04:16.240
your education path, whether to go with like an MIT, it's brain and computer, no,
link |
02:04:25.520
BCS. Brain and cognitive sciences, yeah. Or the CS side of things. And actually,
link |
02:04:33.360
the brain folks, the neuroscience folks are more and more now embracing of the, you know,
link |
02:04:40.000
learning TensorFlow and PyTorch, right? They see the power of trying to engineer ideas
link |
02:04:49.200
that they get from the brain into and then explore how those could be used to
link |
02:04:55.360
create intelligent systems. So that might be the right department, actually.
link |
02:04:58.640
Yeah. So this was a question in, you know, one of the Red Bull Neuroscience Institute
link |
02:05:04.400
workshops that Jeff Hawkins organized almost 10 years ago. This question was put to a panel,
link |
02:05:11.120
right? What should be the undergrad major you should take if you want to understand the brain?
link |
02:05:16.160
And the majority opinion that one was electrical engineering.
link |
02:05:22.080
Interesting. Because, I mean, I'm a doubly undergrad, so I got lucky in that way.
link |
02:05:26.960
But I think it does have some of the right ingredients because you learn about circuits,
link |
02:05:33.280
you learn about how you can construct circuits to, you know, approach, you know, do functions.
link |
02:05:40.080
You learn about microprocessors, you learn information theory, you learn signal processing,
link |
02:05:45.280
you learn continuous math. So in that way, it's a good step to, if you want to go to
link |
02:05:52.160
computer science or neuroscience, you can, it's a good step. The downside, you're more likely to
link |
02:05:58.000
be forced to use MATLAB. So one of the interesting things about, I mean, this is changing, the world
link |
02:06:09.040
is changing, but like certain departments lagged on the programming side of things, on developing
link |
02:06:15.920
good, good happens, there's a software engineering, but I think that's more and more changing. And, and
link |
02:06:21.520
students can take the answer of their own hands, like learn to program. I feel like everybody
link |
02:06:27.520
should learn to program, because it, it like everyone in the sciences, because it empowers,
link |
02:06:35.360
it puts the data at your fingertips. So you can organize it, you can find all kinds of things in
link |
02:06:40.800
the data. And then you can also, for the appropriate sciences, build systems that, like based on that.
link |
02:06:47.840
So like then engineer intelligence systems. We already talked about mortality. So we hit
link |
02:06:55.760
a ridiculous point, but let me ask you the, you know,
link |
02:07:01.680
one of the things about intelligence is it's goal driven. And you study the brain. So the
link |
02:07:12.960
question is like, what's the goal that the brain is operating under? What's the meaning of it all
link |
02:07:17.360
for us humans in your view? What's the meaning of life? The meaning of life is whatever you
link |
02:07:23.920
construct out of it. It's completely open. It's open. So there's nothing, like you mentioned,
link |
02:07:31.760
you like constraints. So there's what's, it's wide open. Is there, is there some useful aspect
link |
02:07:40.400
that you think about in terms of like the openness of it and just the basic mechanisms of generating
link |
02:07:47.440
goals in studying cognition in the brain that you think about? Or is it just about, because
link |
02:07:55.920
everything we've talked about kind of the perception system is to understand the environment.
link |
02:07:59.760
That's like to be able to like not die. Exactly. Like not fall over and like be able to,
link |
02:08:07.600
you don't think we need to think about anything bigger than that.
link |
02:08:12.320
Yeah, I think so, because it's basically being able to understand the machinery of the world
link |
02:08:20.080
such that you can pursue whatever goals you want, right? So the machinery of the world is
link |
02:08:24.720
is really ultimately what we should be striving to understand. The rest is just,
link |
02:08:28.960
the rest is just whatever the heck you want to do or whatever, whatever, whatever is culturally
link |
02:08:33.840
popular. I think that's, that's beautifully put. I don't think there's a better way to
link |
02:08:44.160
end it. I'm so honored that you would show up here and waste your time with me. It's been
link |
02:08:50.240
awesome conversation. Thanks so much for talking today. Oh, thank you so much. This was, this was
link |
02:08:54.400
so much more fun than I expected. Thank you. Thanks for listening to this conversation with
link |
02:09:00.880
Delete George. And thank you to our sponsors Babbo, Raycon earbuds and masterclass. Please
link |
02:09:07.920
consider supporting this podcast by going to babble.com and use code Lex going to buy raycon.com
link |
02:09:14.720
slash Lex and signing up a masterclass.com slash Lex. Click the links, get the discount.
link |
02:09:21.360
It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube,
link |
02:09:26.720
review the five stars and app a podcast supporting on Patreon. I'll connect with me on Twitter,
link |
02:09:32.240
Alex Friedman spelled yes without the E just F R I D M A M. And now let me leave you with some
link |
02:09:41.520
words from Marcus Aurelius. You have power over your mind, not outside events. Realize this
link |
02:09:50.400
and you will find strength. Thank you for listening and hope to see you next time.