The Thousand Brains Theory


>>Well, it gives me
great pleasure today to welcome to Microsoft Research, Subutai and Jeff Hawkins. Subutai Ahmad and Jeff Hawkins
have been very gracious to come down all the way and
make a very precise visit here. I think if I start introducing them, we will be here for a long time, and going through
all their accomplishments. But, I will just touch upon
the main points starting with Jeff. Jeff has had a lifelong interest
in neuroscience. He created Numenta, and it’s
focused on neurocognitive theory. I think you-all probably know
that he invented the PalmPilot, which is just incredibly cool, and ever since that time, he has been focusing on neuroscience
and doing great things here. In 2002, he founded
the Redwood Neuroscience Institute, where he served as
director for three years. We’re going to be incredibly honored to hear about his thoughts
on neuroscience. Then, we have Subutai, who is the Vice President
of research at Numenta. He’s going to talk to us today
about deep learning vision, and he has been an entrepreneur and his detailed theory of the neocortex. He holds a PhD in
Computer Science from UIUC. We’re going to have lots of fun. Thank you very much for being here.>>All right great. Thanks
Dave. I hope we have fun. So, we always try to have fun. Just a couple of words about
how this meeting came about. Subutai and I were in
Seattle back in December, we spent the day at the Allen Brain Institute
where we gave a talk. At the end of that day, through a mutual acquaintance, we came over here at Microsoft, had dinner with Satya, Nadella, and Eric Horvitz and
a few other people, just to talk about what’s
going on in brain theory. After that dinner we said, “Hey, maybe it would make sense
to come and talk about our work here at Microsoft Research.” So, that’s the genesis
of why we’re here. So, let’s just jump right into it. The title of our talk is: The Thousand Brains Theory
of Intelligence, a framework for
understanding the neocortex and building intelligent machines. The way this is going to
work, I’m going to talk for half the talk and Subutai is going to talk to you
the other half of the talk. You’re going to hear both of us.
So, just a word about Numenta. Numenta is a small company. We’re in Redwood City,
Northern California. We’ve been around for about 14 years. We have about 10 employees. The way to think about
it is just think about as the research lab.
That’s what we are. We are a research lab. It’s
somewhat independent research lab. We have a scientific mission. Our scientific mission is to
reverse-engineer the neocortex, and I’m going to talk about that. Our goal there is
a biologically accurate theory. This is a biological
research mission. This is not to look for inspiration
about how the brain works. We’re not interested in
theories inspired by the brain, we really want to understand how
the neurons in the brain work. We test our theories via
empirical data and simulation. We don’t do experimental
work ourselves, but we do that with collaboration
of others and publish data. Everything we do is
open and published. So, everything I’m going
to present here today has been published in
peer-reviewed journals. A few exceptions that
haven’t been published yet. But we don’t hold anything back, and there’s no secrets of what we do. We have a second mission, which is really second in priority, and that is the following is that, to take what we learn from neocortical
theory and apply it to AI. Now, when I got interested in
this field almost 40 years ago, I felt immediately back then, that if we’re ever going to build
truly intelligent machines, we have to understand
how the brain works. I felt that this was really the path
there. I still believe that. So, for the last three
years at Numenta, we have done almost nothing
on this second mission, and the reason is
because we are having so much success on the first mission. We put all that aside just to focus on this stuff. I’m going
to tell you about that. But in the last few months, Subutai has been
reengaging on this sort of the machine learning side
of our mission statement, and he’s going to
talk about the recent progress of storing the make there. Okay. So, that’s who we are. Just to remind you, the neocortex is about 75 percent of
a human brain by area, and the other 25 percent are things like
the autonomic nervous system, controlling heart rate and breathing, various basic behaviors
such as reflex reactions. Even things like walking and running or controlled
by the whole brain, your emotions are in the whole brain. The neocortex dominates our brain, it’s the organ of intelligence. So, anything you think about, anything from our perceptions to language of all different types,
mathematics, engineering, science, art, literature, everything that you guys are paid to do here comes
out of your neocortex. My neocortex is speaking
and yours is listening. So, that’s the organ
we want to understand. The first question you
might think about is, what does the neocortex do? It took me really while to totally
understand this completely. Some people think, “Go
and get some inputs from the sensors and then you
act.” That’s not really true. What the neocortex does is it
builds a model of the world. So, when you’re born, that structure is there, but it really doesn’t know anything. You have to learn
a model of the world, and your model of the world is
extremely complex and rich. There are tens of thousands of
objects you know in the world. You know how they look,
and how they feel, and how they sound. You know where these objects are located and how they
interact with one another. Objects have behaviors and
you have to learn those too. My smartphone is an object
that has behaviors. It has all these complex behaviors, as I manipulate it,
things change on it. That’s all stored in the neocortex. Even simple things like
a door in this room has behaviors that opens
and handles, and so on. You have to learn all that stuff. The neocortex learns
both physical and abstract things. So, you can literally learn how a coffee cup is and what it
looks like and how it feels, but you also can learn concepts related to things like
democracy or something like that, and things you’ve never been
able to experience before. But, when it builds this very
complex model of the world, and why does it do that? The advantage of this from
an evolutionary point of view is it’s a predictive model, meaning it’s constantly predicting what’s going to happen
next in the world. It’s constantly predicting what
you’re going to see and what you’re going to feel, what
you’re going to hear. You’re not aware of most
of those predictions, but they’re happening. Because it’s a predictive model
in this very complex model, the world you can use it
to generate behaviors. So, you may sit in this room and listen me and not
really do anything for a while, but I’m changing your model
of the world slightly through this talk and later you might behave differently
based on that model. So, question is, how is
this structure which sits in your head basically learn this
very complex model of the world? Now, if you were to take this out of your head and iron it flat,
it’d be about this big. It’s about 1500 square centimeters, it’s 15 inches on the side. It’s just a little bit
thicker than this, it’s about two and a
half millimeters thick. So, if you could do
this and you look at the surface of it,
there’s no demarcations. You can’t see what anything looks, it’s all like one uniform sheet. But we do know it’s divided
into functional regions. In a human, we don’t really know, but the estimate is little bit
over 100 different regions that your neocortex is divide into. There are regions that are
responsible for vision, and hearing, and touch, and language, and so on. They’re connected together through the white matter in
this very complex way. Okay. So, I’m just
going to show you here. Here, I’ve highlighted a few of
these and so there’s a bunch of visual regions in
the back of your brain here, there’s some auditory regions where
the sensory input comes into. Your eyes project to
the back of your head, your ears project to
the temporal lobes, your body projects
your somatosensors, touch to this area across
the top like that. The story that’s told how this works. This is the conventional view, which you already guessed is
really not correct at all. But, the way this
classically has been viewed, is you have something like
your eye and you have a retina, and it projects to
this first region, back here. That somehow extracts
simple features, and then that projects
to the next region, which extracts more complex features. In the brain, you do this
three or four steps, and then all of a
sudden, you have cells that represent entire objects. So, it’s a feature extraction
in a hierarchy paradigm. It’s presumed something similar is happening in the other
sensory modalities. It is also not at all clear how, because you don’t have
this regular topology on your skin as you do in the retina. But the same idea is
supposedly happening. Then somehow, there’s a some
multimodal objects occurring in all these other areas that really people don’t really understand
what they’re doing at all. So, we have a somewhat of an idea what’s going on in
these primary sensory regions, but not really very well, and then somehow all this other
stuff is happening up there. So, that’s the story that’s told
in a Cloud of classic view. If you get a neuroscience textbook,
it will talk about them. Here’s the reality.
Here’s a picture of, we can’t do this for humans, but this is a primate,
this is a macaque monkey. This is a very famous picture in neuroscience if
you don’t know that. This came out in 1991 by these two scientists,
Felleman and Van Essen. These rectangles here are regions
of the monkey’s neocortex. It’s smaller than ours, but
it’s basically the same idea. So, these are all
the different regions, and each one of these lines
here represents millions of nerve fibers connecting
them in various ways. This thing is really, really crazy complicated and it doesn’t look simple like
that thing up there. We can make a few
observations about this. The vast majority of the connections between regions here are
not hierarchical at all. They go all over the place. So, 40 percent of all possible
connections between regions exist, and many regions get input
from 10 or more other regions. Since this diagram was
made 30 years ago. Was it 30 years ago? Not quite, but we now there’s even far more connections that they missed that are going across
from all different places. So, in this diagram, you have the touch sensors on
the left and the vision regions on the right and they’re
presented as a hierarchy, but it’s not really like that at all. Our brains are like this,
but even more complicated. So, somehow this is the organ of intelligence and we want
to understand all this, which seems quite daunting. So, the next thing
we can do is look at the detailed circuitry
in the neocortex. This is within the 2.5 millimeter
thickness of the neocortex, and see what the circuitry
looks like in there. This has been studied for
well over 120 years now. These pictures were made
by Cajal back in 1899. This is the 2.5 millimeter
thickness at the top, towards the skull and towards
the center of the brain here. What they’re showing in
here back in those days, we’d see there’s these cell types, these individual cell bodies, and you can see there’s
a stratification going on. Then, here they’re showing that
the axons and the dendrites, the process that come out of the cell body that
connect to other neurons. You can see they are
predominantly vertical. So, there are literally,
I’m not joking, tens of thousands of papers have been published on the architecture in the neocortex that’s represented
here first back in the 1800. Here’s a picture that I made which is showing some of the dominant
connections between here. We spent a lot of time
reading these papers. So, you can see these
different layers, they label them one through six, but that’s really a misnomer. It’s a rough guide. There’s these very prototypical
projections that go between different cells and different layers, and there are different types
of connections, indicated those here
by blue and green. This is very much more
complicated than this. But this is a prototypical circuitry, and we can make a few general
observations about this. One observation, there are dozens
of different types of neurons. By different types, I mean they
have different response property, different connectivity
property, maybe different gene
expressions, and so on. They’re roughly
organized into layers. Most of the connections in
the neocortex are across the layers. So, input comes into layer four, and then it goes up, and down, and back, and forth like this. There’s very limited
connections horizontally, there’s a few layers that
predict long distances. But most of the information
comes up and down, up and down, and then this spreads on a few layers at long distances. One of the surprising things is that, all regions in the neocortex, and people didn’t know this until 20 years ago, have a motor output. So we think about,
“There’s a sensory input that gets the part that gets
an input from the eyes, and then it goes up and
down the hierarchy, and then you do some behavior.” It turns out that
everywhere you look, there are these cells in
layer five which projects someplace in the rest of the body
of the brain and make movements. So even in the primary visual cortex, the first part that gets
input from the retina has cells that project back to an old part of the brain
that moves the eyes. The regions of the auditory
cortex projects all parts of the brain
that move your head. So, every sense in the brain
is a sensorimotor tissue. There’s no pure sensory input. It’s sensorimotor all the way, every region is sensorimotor. Now, couple of observations here, this circuitry and its complexity is remarkably the same everywhere. It doesn’t matter if you look
at reasons doing language, or vision, or hearing, or some unknown function,
it looks like this. Now, this is an area of contention because you can find differences
in the different regions. There are some regions
that have more of this cell type and less
to that cell type. Some little bit thicker,
some little bit thinner, some have an extra special
little thing here. But the variation between
the different areas within the neocortex is remarkably small, compared to the commonality. So, that’s an incredible observation. The other thing we
can say about this, is this is a very complex circuit, and it’s going to do
something complex, it’s not going to do
a simple function. This is the complexity
there for a reason. Okay. Now, how do you
make sense of all this? The first guy who made a really
big observation is this guy here, Vernon Mountcastle, is a
neurophysiologist at Johns Hopkins. He published his idea in 1978 in a 50 page essay which
is somewhat famous, and is one of the most
incredible ideas of all time. I put it up there with
Darren in terms of its significant of his observation. He said, “The reason
that the old regions of the neocortex look the same, because they’re all doing
the same intrinsic function. They’re all doing the same thing, and what makes a visual area, vision and a language area, language in somatosensory
areas touch, is what you connect it to.” So if you take a region of cortex and connect to its an eye,
you get vision. If you take a region
of cortex and you connect it to ears, you get hearing. The brain got big by just
replicating the same thing. He then said, “A cortical column which is a little bit
under a square millimeter, I’ll just call it
a square millimeter, but little bit under
a square millimeter contains all the essential circuitry
that you’ll see everywhere. So a cortical column is
the unit of replication. If you can understand what
a cortical column does, then you understand the whole thing. You can visualize it like
this. You can say, “Okay. Here’s the neocortex
and these are columns.” Now, they don’t look like this. You actually can’t see them like this, and not
stacked up like this. But functionally and anatomically, one can argue this is the
way it is and there’s a few places in some brains where
you actually can see columns, but mostly it’s not. It
was more like, “Hey. You won’t see them, but
they really do exist.” So that’s it. So now, in a human, we have
150,000 of these things. We had a 150,000 copies of
the same basic circuitry. There’s a corollary to what
Vernon Mountcastle proposed, which I came up with. I don’t know if anyone
ever pointed out. But if you actually
believe what he said, that every column is doing
the exact same thing, then by definition, every column must perform the same functions
that the entire neocortex does. There is no other place
for things to happen. So, if I’m going to have prediction occurring
some place in the brain, it’s going to be
occurring every column. If I’m going to learn sequences and be able to play back sequences, that’s going to occur in
every column, and so on. But this is such a crazy idea that most neuroscientists really
don’t know what to do about it. It’s like a ton of magic. But that’s what he proposed,
that turned out to be true. So I’m going to jump forward here. We’ve done many, many years of
research on this and we’ve been piecing apart different
aspects of how neurons work, and how dendrites work, and how synapses work, how the
circuitry is in the neocortex work. I’m going to skip all to that into something that
happened three years ago. So we had built up a base knowledge about a lot of things going on there. For three years there,
we had an insight which blew the whole thing up. That insight started
with this coffee cup. I was literally in my office, and I was playing
with this coffee cup, and I asked a very simple question. I said, “Am I touching
this thing with my finger, and as I move my finger around, I make predictions about
what is going to feel.” I said to myself, “What does
the cortex needs to know to predict what my fingers going to feel when I move
it and touch the lip?” I can imagine, I can
imagine that feeling. What does it take to know that? So first of all, it has to know
that you’re holding a coffee cup, because different objects would
lead to different predictions. The second thing it needs to know, it needs to know where the location of my finger is in the reference frame
of the coffee cup, relative to the coffee cup. It doesn’t matter where
the coffee cup is to me or it’s orientation
doesn’t matter, but I have to know where my finger is in the reference frame
of the coffee cup. I need to know where it is,
I need to know where it will be after I execute a movement. So I’m going to execute a movement,
and while I’m executing it, the brain has to predict where
the new location will be. Then, based on the model
of the coffee cup, it can predict what’s going to sense. I very quickly realized that this is a universal property in the cortex. When I’m holding it
with my hand like this, every part of my skin
is predicting what is going to sense as it
moves off this coffee cup. Every part has to know what, where it is, and the reference frame
of the coffee cup. It has some sense of
location in the objects, it’s manipulating. This was a new idea. We then ran with this, and we published our first paper
on this in 2017. We explained a lot of the detailed mechanisms of what
we think is going on here. I’ll give you
just the highlights of it. So you have this to your hand into
your finger touching this cup, and that typically
goes into layer four, and this is the primary
input layer in any region. That’s your sense feature. This is all the very major connection between these cells in
layer six and layer four which are well-documented and has a certain type
of effect on layer four, its indicated by blue here. We propose that this is representing the location relative to the object. Now, we have two things here; the sensation and the location
relative to the object. Then we can integrate over time and form a representation of
what the object itself is. So if you can think
about it, you’d think, the cells here, their activity is going to represent the coffee cup. These cells are going
to be changing based on both the sensory input and
where it is in the cup. You can sense they integrate over
time and build a model of a cup, it’s really a model of the cup, the morphology of the cup. It’s like what features
are at what locations, and your brain has to know this. We detailed the actual mechanism, how the neurons do
this in quite detail. I’m not going to go
through that today. [inaudible] will touch
on a little bit later. We then say when you’ve got
multiple columns going on here, and so imagine you have
different parts of your skin touching
the cup at the same time, we just have one finger. In order to either learn
the cup or infer the cup, you have to move the finger up. Imagine reaching into a dark box and you try and
recognize what this is, you have to move
your finger to do that. But here, if I reach
in and grab it with my hand at once, I
don’t have to do that. I can recognize the cup
with one grasp. The reason that is is because there’s these long range connections
in this upper layer here, layer two, three, that go across
large areas within your cortex. These represent a voting mechanism. So each column is getting
some different input, has a guess as to what
it might be feeling. It doesn’t really know but they
vote together and they settle on what the only thing
that’s copacetic with the locations and
the census they have. So we model this, we
show how this work. You can make the same argument
that’s going on in all your senses. You might think vision is
different, but it’s not. The retina, the best way
to think about the retina, it’s just a topological array
of sensors just like your skin is a topological
array of sensors. Each column in
the primary visual cortex is looking at a small part
of the visual object. Nobody looks at the whole thing. So each part of
the visual cortex is looking through a little straw,
it’s like a fingertip. If I had to look through a straw, I have to move the straw around
to see what I’m looking at. But if I have all of
those columns active at once, bingo, I recognize the object. So we walked through all
the details of this, but there’s a big question. How is it possible for neurons to establish a reference frame of an object that doesn’t
even know what it is, and then know where
it is on that object? How could that be done
with real neurons? So we proposed in this paper
where to find the answer, and that proposal turned
out to be correct. So now, we’re going to skip to that. It turns out that there is a very well-studied thing in the brain called
the entorhinal cortex, which is not part of the neocortex,
it’s part of the old brain. It creates reference frames. What we propose is that
the cell types that revolved a long time ago in the entorhinal cortex are now
existing in the neocortex. So let me just tell you about grids. These things are called grid cells. You might have heard of
them, they are very famous. They were first discovered
by the Moeser team in 2005. What they do is they create reference
frames from an environment. It’s been studied in rats mostly. In blue here is the entorhinal
cortex. But we have it, too. You and I have
a small entorhinal cortex, and it has these grid cells in. What I mean by an environment
is like this room, this room is an environment. So my grid cells right now have established a reference
frame in this room. Their activity tells me
where I am in a room. Even if I close my eyes
and I walk over here, I have a sense that I’m at
a different location in the room. I know I’ve moved. I know
the old spot was over there. I know I’m a little bit
closer to this wall, further from that wall. So even with you, at
any sensory input, there’s an internal system for tracking where you are in this room,
and that’s what’s happening. So they create this reference frame, they represent the location of the body in the room
or the environment. This is useful for
building maps of rooms, like where are all
the things in the room. For navigation, like how
far am I from that door, and how many steps would I
have to take to get there. The grid cells provide
a metric space for doing this. This evolved in the old part
of the brain for navigation, which is one of the first things
that animals have to do when they start to move
around the world and navigate. Our hypothesis was that grid cells also exist in
the neocortex in every column. Now, they’re creating
reference frames for objects that you interact
with in the world, both physical and abstract objects. They represent the location of that cortical columns input
in that reference frame, and they’re needed for
learning the structure of objects in the world, and for navigating our limbs and different parts of our bodies
relative to those objects. So we have now detailed different parts of this
as a series of papers. This is a visual picture of
the same thing I just told you. So here’s, on the left side,
the entorhinal cortex. These are two rooms
a rat might be in. These letters represent
locations in the room. So when the rat’s in this location, these cells have one activity,
we’ll call it A. When they’re in this section,
a different activity called B, and a different section here. It doesn’t matter how I got to C. I can go from A to B
to C or direct to A, C. Whenever the rat’s in
that position or you’re in that position, that activity occurs. The same thing is going
over here, but now, just going back to the fingertip, you have objects like
my coffee cup or a pin, and there was basically
a reference frame of points around this thing; this label, some of them here, and as you move your finger
relative to these objects, those cells are changing. They’re representing
the location of where that sensory input is
relative to those objects. The way grid cells
work is really cool, it’s fascinating, but it’s not easy to describe in a very short talk. So in a longer talk, I could tell you all about it. But just trust me that
it’s a really cool way. It’s not like Cartesian coordinates. There is no zero point here, it’s a self-referential
reference frame, and neurons do this, and nature figured how to
do it, it’s very cool. Okay. So then went back. Okay. So we didn’t know how these cells were
representing location. Then we said we do know. They are grid cell, grid cells
come in these modules, why we call them grid cell modules. So in every cortical column, there are grid cell modules. Then last year, we published
a paper which describes in varying detail how these modules
work related to here. There’s an interaction
back and forth, so how do they figure out where you are at the same time
they’re touching? So we worked out the detailed
mechanisms for this. Okay. I’m now going to
jump forward to today. Most of this has been published, but not all of it, we’re
going to do that this year. I’m just going to tell
you the complexity of what we think is going on
in a cortical column. Yes?>>I had a question. You
probably have said this before. If i just give you a cup,
your eyes will be closed, you won’t know exactly
where your finger lands at, you’ll still be able to reconstruct the cup by moving it around then?>>Yeah.>>So that suggests that you need to maintain hypothesis about
the initial location, not just a single location.>>Yes.>>So [inaudible]>>Yes.>>So how does the brain [inaudible]>>[inaudible] is going to
talk about this. The question, for those who can’t hear
me and people online, is how does the brain keep track of multiple hypotheses. I’m
going to rephrase it. How does the brain keep
a track of multiple hypotheses?Because it doesn’t
know where it is yet, but it might have found
an idea where it is. It could say, given what I sensed, I can be in these sets of things,
in these sorts of places. Right? One of the discoveries
we made a number of years ago, and [inaudible] going
to talk about this so I’ll just give you a clue about it, all of the activations in the brain are sparse, meaning most of the time, very few cells are active
and most cells are inactive. The way the brain represents uncertainty is it
forms unions of patterns. It activates multiple patterns
simultaneously at the same time. Because they’re sparse,
it doesn’t get confused. Surprisingly, it’s not obvious, you might [inaudible] up front, but the brain can actually activate multiple guesses at the same time
in the same set of cells. What you see in the neural tissue, when there’s uncertainty, you
have a lot higher activity rates, and when you’re certain,
it gets very sparse. So basically, there isn’t
like a buffer someplace. It’s not a classic
probability distribution. It’s literally a union of hypothesis that are
happening at the same time. The mechanisms we showed here show
how that union gets resolved. That’s an important part
because we think this way the brain represents
information is really critical, and similarly is going
to talk about that. Okay. Felt my answers
are sufficient for now. Let me just, unlike any I’ll explain all. I’m
certainly jumped through it. We now believe is going
on in every column, we believe there’s actually
two reference frames. There’s these two sets of
cells, and 6a, and 6b. So, it’s able to represent
tooth spaces at once. What the column learns? First of all, when it’s
observing something, it learns the dimensionality
of the object. There’s no assumption about, is this a one-dimensional or two-dimensional or three-dimensional
or n-dimensional object. It can learn the
dimensionality of the object. In that dimensionality, it can
learn the morphology of the object. What features exist at
what’s points in that space? It can learn the
changes in morphology. We detailed in this paper last year
of how this could go about. But literally, here’s a laptop and
when the link goes down and up, I have to learn, that’s
a change in morphology of this. So, this thing has
to be able to learn those things as like
the behaviors of objects. It’s able to learn both compositional
and recursive structures. So, here’s an example of
a compositional object. I have this coffee cup,
and there’s a logo one. The logo was learned before I
had a coffee cup and had a logo. Now, I go into a new object with
the coffee cup and the logo. I don’t want to have
to re-learn the logo. I want to be able to sign
a logo to the coffee cup, and so that’s a compositional object. It also can learn
recursive structures. In the sense, I can have
a logo with the coffee cup, and the coffee cup can have
a logo on it and so on. By recursive structures are essential for language with other things. So, this of course is just
them also has a motor output. So, it generates motor behaviors, and it complied any kind of object, a physical object or
an abstract object. If you were a cortical column, you do not know what
your inputs represent, and you actually don’t know what
your motor outputs represent. Your just this bunch of neural tissue up there
and you’re trying to figure out how to model
the input space you’re given, given some ability to
generate some behaviors. So, if you attach this
to a part of the retina, you’ll learn very simple
visual objects here. If you attach it to some output
of some other regions, you might get
some very abstract things. Okay. So we’re just going
to leave this moment. You have 150,000 copies
of this in your brain. So, how does this all
come back together again? Let me go back. This
is where the title of my talk is The Thousand Brains
Theory of Intelligence. I mentioned earlier
the classic view of a hierarchical feature
extraction model, which is underlying basically all convolutional neural
networks these days. But, the brain is not like that. The brain has some hierarchical
construction should but I mentioned earlier that most of
the connections are not that way. So here, what we can see over here that says that
the alternative view is instead of having all these columns and they’re all doing the same thing, all doing this modeling. If I sense a cup like a touch it and I look at
it maybe at the same time, you’re going to evoke
a whole bunch of models at different levels in
this hierarchy, all about the cup. Now, these models are not identical. They vary in different ways. Well, of course the ones on the left, you’re all going to
be vision-related. The ones on the right, you’re going to be
all somatosensory, a touch-related. But here they’re going to
be on different parts of your retinal space deep structure. So, there’d be some models
of the part of the retina built in
this part of the retina. There also there even in
any of these modalities, this will like to
impart to your skin. Even here, you might have some
models built on color primarily, others more different types
of black and white. Here, you might have
some model to have more impact of temperature, which would have been tell
you the material surface, others may not.
Doesn’t really matter. You have this array of column, all modeling the same object. Then these long range connections, which I’m just thinking out here. Basically allow
all these models to vote, even at the lowest level, even at the primary visual region in the primary somatosensory region, we find these connections
going across, which make no sense in
a hierarchical model. But they make sense here. Everybody is trying to
guess what’s going on. So, this allows you
to resolve ambiguity, allows you to do inference
much faster without movement, and is why you have
a single precept of the world. You may have thousands of models of the object being observed
at the same time, but what you’re perceiving is a crystallization across
this upper layer which says, “Yeah, we’re all agree now,
this is a coffee cup.” A lots of different people
contributing to this right in the moment,
but doesn’t matter. These different models
can come in and out based on obscuring of data, and so it doesn’t really matter.
We all know it’s a coffee cup. So that’s what we call this, “The Thousand Brains
Theory of Intelligence.” My last slide here is the following. Now, I’m going to turn
over the Subutai. So, the question is, will these principles be essential
for the future of AI? We care, doesn’t matter
how brains work. Whereas a lot of success going
on in the air right now, and most of these things. So, but most of these things I just talked about are
not part of that. Well, as I said earlier, I believe some of these will be if we think about
where AI is going and what we want that to be and how crude they our systems are today
compared to what they could be. So, how just here are some things
I think are absolutely essential in the medium
and the long term. So not the very short term right now. If we really want to get
to the future of AI, you’re going to have systems
at our sensory motor. Their sensory motor learning
and inference. You can not learn the structure
of the world about moving, you can learn only a very
impoverished models of the world. So, we learned mostly by
moving through the world. I can’t learn what this building is unless I move through the world. We’ve talked about
the slam thing earlier. So, in my mind, AI and
robotics are not separable, they’re not really separate problems. They’re really the same problem. These models are going to be based on object centric
reference frames. That’s very clear to me now. You’ll have to do the way
grid cells work maybe, maybe not. I don’t know, but it’s going to
be worked an object sends it reference frames the way grid cells
do it. It’s pretty cool. So that might be the right way
of going about it. Then they’re going to be
many small models with voting. There are a lot of advantage
is doing this way. Robustness is won, but
it gives you this, I don’t imagine you can really build a true complete AI system
that didn’t work this way. In the short term or in the near term, there’s
some things we can do. This is my clue into Subutai talk. He’s going to talk about
these things right now. I mentioned earlier the
sparse representations. That’s the way the brain works, and that leads to
a very strong robustness in the representational
model and brains. In the neuron model that we use, I didn’t talk about that. Subutai will talk about that. We model neurons quite differently, and people that are
the point neurons are used in artificial neural networks. The real neurons
don’t work like that. We’re going to talk about this,
but some of these properties of real neurons are essential for
continuous and online learning, and Subutai is going
to talk about that. So, not that you are
going to applaud, but if you were, you can hold it. Now, you have a question?>>Yeah. You describe this flat set of sensormotor grounded columns.>>I didn’t say it was flat, I said this as mostly non-hierarchical.>>So far, then you make these sort
of hands to be hierarchical. So I’m wondering hierarchy
comes from and it’s like a conceptual hierarchical
also grounded sensormotory.>>All right. So, you
got the idea that every column is doing
the same thing, right? Reality, if you look at
the visual regions in the neocortex, the first three visual regions V1, V2, and V4, all received
direct input from the retina. It’s not just going into V1, it’s projecting multiple levels
up, and not to all levels. So, one of the things
that’s going on there is that these different regions
here actually are going to be modeling objects at different
spatial sizes on the retina. So, if I had the
smallest possible thing, I can see like the smallest print
that I could read on. I’m telling you that’s
being recognized on V1. Most people would be very
surprised to hear that. Now, in addition to that projecting
of a multiple regions, there’s convergence from V1 to V2, and from V2 to V4. So, there’s both hierarchy and non-hierarchy going
on at the same time. We don’t have a detailed model of how the hierarchical composition
works yet so fine, but that’s where the AI
what it looks like. So, it’s not completely flat. It’s definitely not flat, but it’s definitely
a lot flatter than most people think, and very, very few people would think that
even in these primary regions of somatosensory cortex that real
object recognitions going on. I’ll give you a couple of data
points you might find interesting. In a mouse, a mice can
see pretty interesting. They can do a lot of
interesting manipulation and so on. Mice really have only V1. They have almost no V2. Almost all the vision occurs
in the primary retinal region. In humans and monkeys, we take the monkey like the macaque
I talking about earlier, 25 percent of the entire neocortex
in the monkey is V1 and V2. These regions are huge, and all the other regions
are merged much smaller. That two tells us
that most of what we know about vision is occurring
in these lower regions here. So, it’s just moving this idea that somehow you’ve got
these feature extraction is going at some big thing up here.
It’s it’s just the opposite. The big stuff is happening down here, and then there’s his convergence
as you grew up here, and this more high-level concepts. We don’t really understand exactly
how that was formed. Yeah?>>It seems that the
accessible of this is hinges on the ability to hold this model slightly shows different hypothesis.
So a big difference.>>Yeah.>>Sensory variability is just another effect of
topological differences of the brain or is actually there’s an armature that says
you must be different?>>Make sure I
understand the question. Are you saying that
those differences, is that genetic or is it learned?>>Are those differences managed
through higher level of process.>>No.>>Or actually they’re just a
consequence of the brain being self->>Well, let me try to
answer that question. First of all, the basic theory says, the differences are really primarily just where they are and
what they’re getting input from. Much of that is genetically
determined at birth. You have this structure
here like that, and there’s also at birth,
there are differences. I mentioned that, there are
differences between these regions. So, evolution’s figured out that in some regions I might want
a little bit more of this cell at birth, you
have these differences. But primarily, it’s just where they are and what the
topological impetus. Now, the system is very flexible. So, if you were born without a retina or you’re just
blind, congenitally blind. These regions don’t atrophy. It’s surprisingly what happens is, they get inverted and input
starts coming this way to them, and they accurately
operated backwards. So, there’s a lot of learning
flexibility going on here, but nobody’s in charge.>>Are there going to be
consequences observed?>>Maybe I didn’t understand
the question, I don’t understand.>>No, because it will
be that there’s just not enough to put
all this columns together. Then, there’s a successful
physical structure that led to this thing being robust.>>But it’s not precise at all. It’s like you can hook these things
up almost any way, it’ll work. The question is, if you hook
them up a little bit different, at least so you might
get a better result. So, forgettable. Let me
give you another example. In normal humans, the size of V1, the area of V1, one of
the largest regions in the brain. In normal humans, it varies
by a factor of three. Now, presumably, what I’ve read is that people
who have larger B1s are better at high acuity low, probably it’s really
working very small things. They will be better at that. But the people who don’t,
they are not as good as that. I mean, they both see, they
all think they’re normal. Life goes on but some are going to be a
little bit better than others. So there’s a hell of a lot of
flexibility in the system you get, and there’s all kinds of trauma. You’d knock some of these things out. Everything seems to be working. You rewire stuff, they keep working. There’s one guy Mriganka Sur rewired the ferret’s brain
and it kept working. You could tune it, and make
it better but it’s robust, just the way you do it. Yeah?>>Just a quick question. So every column has a sensory input
and a motor output?>>Yeah. Well, it has an input. It’s not all sensory. It could be for other columns.>>Right. But are
they usually related? For example, is the input
that comes from my finger coming to a place
where there is a motor output?>>Yes. Yes. Yes. That’s right. So in this case, in the case of the finger, literally, the part that’s getting input from the fingertip projects part
back down to the spinal cord, which would be most likely connected to the muscles that would
eventually move the finger.>>Right.>>In the retina, it’s a little different because the retina
moves all at once. So, in the retina, all the columns in the visual cortex project to this one thing called
the superior colliculus, which moves the eye on vote. So, there’s less of a topology there because I can’t
move my eyes differently, or different parts of
my eyes in different ways. But that’s certainly true with
the somatosensory system. There’s a topological
arrangement to that. Yes, over there. This is only half the talk
just to remind you. I will take this one last question
then we’re let you as I go.>>Are the individual columns
really topologically separated? Or is it just the convenience
that we isolate?>>It’s mostly a convenience. Remember this picture here? I said this is not what
it really looks like. You don’t see this. But there’s a lot of evidence that if you were to move
a probe across the cortex, even though you don’t
see these columns, there’s a lot of evidence that
there is a physiological break. Not a physical break, you
can’t see a dividing line, but there is a physiological break between what this section
here represents, and what this section
here represents. So, that’s been documented, and there’s a few places like in
rodents’ somatosensory cortex. Their whiskers are
a very special organ. It’s an active sensing organ. It’s almost like
our fingers and so on. Rats and mice move their whiskers in an active way and in
the barrel cortex, they call it, in the rodent, you actually literally
do see columns. There’s one column per whisker. It’s a direct evidence that
this is what’s going on. The whisker is like a fingertip. If the rat has an array of whiskers, then it’s got an array of
columns and those, you can see. But mostly, it’s
a physiological change that’s going across
the surface of the cortex. So it’s real. It’s not
just one continuous thing. But you don’t see it
physically mostly.>>What about topologically? Are there connections between
the neighboring cells in the cortex?>>In the cortex, I
might end up this. Maybe we can talk about it later. In the cortex, there’s really
only two sets of layers of cells that project
across columns broadly, and I mentioned that those
are both voting layers. Remember, I said
there’s two reference frames, they’re both floating. So one of the layer
cell types in layer five, is on long-range projections and one of the cell types
in layer two, three, there’s a long-range connections, and those guys are all
voting and trying to. Because because local columns will typically be sensing the same thing. So within some area here, these guys are all
sensing the same thing. So this propagation of like, “Hey, what are we all seeing?” happens in some broader area, and so you see those connections
spread over this. But as I said, mostly circuitries within a column. Okay. I’m going to
end it right there, and subitize is going to pick
up, and we’re here all day. So people who want to talk about
this longer, you can do it. Now, you can switch
presentations there.>>As Jeff said, we’ve been really focused on the neuroscience
for a long time. Recently, we’ve started reengaging on here and my talk is going to
be a little bit different. I’m going to focus on
a couple of aspects of what we’re doing in here. The basic idea here is how can we take what we’ve learned from the neuroscience, and
from the neocortex, and apply them to practical systems in a way that we can improve some of
their shortcomings, and maybe improve some of
their current techniques. So, my talk is going to focus on a couple of different
fundamental areas. So, Jeff didn’t touch
on these directly, but a lot of these underlie
the theories that we are working on. So, the first area is robustness, which is a pretty key area. As we discussed, the brain is remarkably robust and resilient
to noise from the outside, internal faults, and
all sorts of things. What we think is that the way that the brain
represents information using sparse patterns of activity is pretty critical in
achieving this robustness. So, I’m going to talk
about exactly how we think about that and
then what we’ve done very recently is incorporate
its sparsity into Deep Learning Networks in a way that mimics what we think
is going on in the brain, and I’ll show you some of
the results from that. The next two are more focused on the learning aspects and I’m going to be talking about continuous learning
and unsupervised learning, and these are big research areas in machine learning as well
as in neuroscience. It turns out that
these two are actually handled in a very similar
way in the brain. It’s really the same mechanisms
and operations that lead to both. So, this is actually
really one topic, not two. What we think is going on here, in order to really explain
that I’m going to dive into a little bit more detail about
how biological neurons work, and how they operate on their inputs, and how they learn. Although we haven’t
implemented this in the context of
Deep Learning Networks, we do have some results
on real-world datasets. I’ll show you how this concept
applies there as well. So, let’s dive into
sparsity and robustness. So, this is a picture
of a pyramidal neuron. This is the most common type
of neuron in your neocortex. Pyramidal neurons have
thousands of synapses, anywhere from 3,000, actually, some places up to 30,000 synapses. What’s remarkable is
that what’s shown here is the dendritic tree where all the inputs are
coming into the neuron. In a single neuron,
this dendritic tree is chopped up into
tiny little segments. Within each segment, as few as eight to 20 synapses can
recognize a pattern. This is remarkable if
you consider there are thousands of neurons that are sending input into it and
this activity is extremely noisy. How can you possibly
recognize patterns robustly using such a tiny fraction
of the available connection? So, this is something that
was puzzling us for awhile, and we think we understand some of the combinatorics and
math behind this now. So, I’ll explain that here. So here is kind of
an abstract view of that and I’m going to consider
Binary Sparse Vector Matching. So, here’s a kind of
a stylized dendrite and here’s a set of n inputs that
are feeding into this dendrite. Now, you can represent the
connections on the dendrite as a Sparse Binary Vector
with n components in it, and a one here will correspond to an actual connection
with a neuron here, and these connections
are learned over time. Then you can also represent the input as a Sparse
Binary Vector with n dimensions to it with a one
being an active unit there. What we care about here is, when are there matches? What are the errors that can
happen when you have matching? So, if you look at the dot product
between these two accounts, the overlap between them, and if the overlap is
greater than some threshold, we say it’s recognized the pattern. We can investigate
that simple operation in this context and see
what it looks like. So here is a picture of
the space of possible vectors, and it turns out the Combinatorics of Sparse Vectors is
really interesting as it relates to robustness
and other properties. So, the gray circle here
represents all possible vectors, and the white circles
represent individual, let’s say dendritic weights
or each dendritic segments. So if you look at X1 for example, you might want to match
patterns against that. Now, there’s a parameter Theta which controls how precise
this match has to be, and the lower the Theta, the more noise you can tolerate. So, if Theta’s really
small you can tolerate all sorts of changes to the vector
and you will still match. The problem, of course, is that as you do that, the risk of false
positives increases. So as you decrease Theta, the volume of the set
of vectors that match, this white circle increases, but the space is fixed. So you’re going to have
much higher chance of matching some other pattern that
was potentially corrupted. So, we can count this and we can figure out exactly what
this probability is. So, for completely uniform vectors, we can calculate exactly the ratio of the white sphere
to the gray sphere. So the numerator counts all of the patterns that will
match a candidate vector, and the denominator is
the size of the whole space. I am not going to walk you
through the derivation of this. But what’s really interesting here is that as you increase
the dimensionality, the size of the space grows much, much faster than the size
of these white circles. So, the ratio of these two, drops very rapidly to zero. What this means is
that you can maintain extremely robust noisy matches with a fairly low Theta with a very
small chance of false positives. So, it’s a pretty remarkable property of these partial representations. So, this graph shows
a simulated version of this. So, here, I have let’s say a
dendrite with 24 synapses on it. A Theta of 12, so you can tolerate
roughly 50 percent noise in here and you’re looking at inputs with different
varying levels of activity. What this graph shows
is that the chance of false positives decreases
exponentially with the dimensionality. So, this is this ratio
that I was talking about. As long as the inputs are sparse, you get extremely low error rates. So, here, if the activity
has a 128 inputs, and you roughly have
2,000 dimensions, your error rate is
down to the 10th to the minus eight. So pretty low. The other interesting
thing to note is this horizontal dotted line there. If the activity coming in is dense. So, in this case, about half
the number of units, the error does not decrease. The Combinatorics are not
in your favor in that case. So, the error is more or less flat. The dimensionality doesn’t impact it. So as long as things are
sparse and high dimensional, you get into this really nice regime where things are extremely robust. We wanted to see if
this kind of property would hold within
deep networks as well. Of course, with deep networks, you don’t work with binary vectors, you work with scalar valued vectors. We wanted to see if
the same property would hold there, and it turns out it does. So this is a similar simulation
except with scalar vectors. The combinatorics that I alluded
to still work for scalar vectors, and the dot product, as long as
any of the components are zero, it’s not going to affect the match. So, those basic combinatorics
are still in play. However, with scalar vectors, the value, the magnitude of
the values are important. So, in order to get this nice
regime, normalization is important. You have to make sure
that both of your vectors are roughly in the same range. As long as you are
careful about that, you get the same basic properties. It’s not quite as nice error rates
as in the binary case, but you still get
error rates that decrease exponentially with
dimensionality. Yeah.>>[inaudible]>>Yeah. So here, I’m specifically
focused on dot products, but you could imagine
other functions would work. The key thing is you want
to ignore the zeros, if either one is zero. Yeah.>>But in real life, how is your assumption that things are running?
Because they are probably not.>>Yeah. So, this relies on
a uniform distribution of vectors, and reality is not
going to be all that. So, the less uniform it is, the worst these properties get. So, I flip it around and say it’s the job of
the learning algorithm, and the job of the system
to try to enforce uniform entropy or maximize
entropy as much as possible.>>The evidence in the brain
is that the brain does that.>>Yeah.>>Because of the connection.>>Yeah.>>Yeah.>>The way the innovation
works in the brain is, if you don’t want to save cells
to be active over and over again, you want a uniform distribution of activity even if it’s very scarce.>>Yeah. It’s an important point, and it’s hard to measure
exactly in the brain, but things are extreme, the correlations in general in
the brain are extremely low. Okay. So, how can we put this
into deep learning systems? So, what I’ve done is created
a differentiable sparse layer. So, on the left I’m showing a vanilla hidden layer
in a neural network, so you have some input
from the layer below, you have a linear weighted sum
of those inputs followed by ReLU or some other non-linearity, and the sparse layer that I’ve
created is very similar to that. The main differences
are as follows, So, first of all, the weight matrix, instead of being dense is sparse. So, most of the weights
are actually zero, and they’re maintained
as zero throughout. So, it’s as if those connection
just didn’t exist. The second thing is the ReLU is
replaced by a k-winners layer. What this does is just maintain
the outputs of the top K units, and the rest are set to zero. So with ReLU, you just
maintain anything above zero. Here, we’re maintaining
only the top K units. You can treat the gradient
exactly as you do with ReLU. It’s one for the ones
that are winning, and zero for everything else. But one problem with that is very
easy in this formulation to get a few units that went
out and stay strong, and so then you don’t get this
uniform distribution that you want. So, what we’ve included
is a boosting term that favors units with
low activation frequency. So, there’s some target
level of activity that’s determined by
the sparsity of your layer, and if the average
activation is below that, you boost their chance of
winning in the sorting. Okay? But the output is not
affected by the boosting term, it’s just which ones are
chosen as the winners. Okay? This again helps maximize
the overall entropy of this, and we’ve shown this
in some past papers. So, it’s a very simple construction. You have sparse weights
and sparse activations. You can also create
convolutional layers that are using the same mechanism. In the results that I show you, I did not use sparse weights for the convolutions because
their filter sizes are pretty small, but in principle, you could do
the exact same thing there. Okay. So, we’ve tried this on two different datasets on MNIST
and on Google Speech Commands. I’ll show those results. So, here I’m showing at one and two layer dense networks and one and two layer sparse networks
the basic test scores. For MNIST state of the art
test set accuracy, if you don’t use data augmentation, is between 98.3 and 99.4, so both of them are in that range. The dense networks are a
little bit better, as you can see. But what’s really
interesting is when you start testing with noisy datasets, this plot shows accuracy as you increase the level
of noise in the input. You can see that the sparse networks do dramatically better than
the dense networks here. Here is some examples of noisy versions of MNIST images and
the dense and sparse results. So, here’s images with
10 percent noise, and there’s still
about the same here, you have 30 percent noise
in the inputs, and you could see that the
sparseness still does really well, and the dense one doesn’t, and here is with 50 percent noise. So, that was encouraging. We wanted to try it on
some harder datasets as well. So, I looked at the Google
Speech Commands datasets. So, this is something that Google
released a couple of years ago. There’s 65,000 utterances
of one-word phrases. This is harder than MNIST, and State of the art is around
95-97.5 percent for 10 categories. Again, I tested accuracy
with noisy sounds as well. Then, as before, so I
have two different types of two layer dense networks and then two different
sparse networks. The basic tested accuracies are about the same
for dense and sparse, in this case, but the noise score, and here you can think of this
as an area under the curve. It’s the total number of correct classifications
under all the noise levels. You can see, again, that
the sparse networks do significantly better
than the dense networks, as we expected from the math. It’s interesting that
this super-sparse network is one where only 10 percent
of the weights are non-zero in the upper layers, and it’s remarkable that it actually even gets
reasonable test scores here.>>Yeah.>>So, this is really fascinating because it’s a neural architecture, it’s automatic architecture search
method we have been exploring. Here at [inaudible] , we’re
finding that if you include sparsity as part of the search
itself, so that you->>Yeah.>>-dwell on regularization
inside your search procedure. You can come up with
much more efficient models, but we have that sparsity at
the level of connections. Like for example, [inaudible] , where everything is
connected to everything,->>Yeah.>>-but in due sparsity, you find you don’t even need like
two percent of the connections, and it makes for
a much more efficient network, wherein all the connections make that answers exponentially bigger.>>Yeah.>>So, would you have
any comments on what happened at the connections level, at least empirically, we
are finding that, but because you have this interesting layers which
are sparse within them.>>Exactly, yeah. So in order
to really get the properties, both vectors have to be sparse. So the weight vectors, as well as the input
vectors have to be sparse to really get
the robustness properties, and I don’t know if you
were looking at robustness. You may be just looking
at test set accuracy. There it’s either one will
work but really to get the noise robustness
properties you need to have both of them.
Both need to be sparse.>>Dense with sparse connections is,>>Is not as good.>>Not as good. Okay.>>Yeah, and the two percent number
is also interesting. That’s sort of roughly the level of sparsity you can see in a lot
of areas of the brain actually. So that’s another interesting factor. But yeah so it’s remarkable. Yeah, go ahead.>>Have you looked at adding noise
within the layers [inaudible]>>Yes. So dropout for example. So I tried training with dropout. It hurts the sparse networks. Another way to say it is you
don’t need to use dropout. The dense networks sometimes helps. You have to really
tune the dropout rate, but in no cases that anywhere
close to this sparse. So dropout as a regularizer is known to help a little bit and sometimes
help with test set accuracy, but if you use sparse networks, you just don’t need to
worry about dropout. Okay. I’m going to switch
to the second topic here which is unsupervised
and continuous learning. Here I’m going to dive back
into the neuroscience briefly. So here’s our favorite neuron
again, the pyramidal neuron. It’s got about 3,000 to 10,000
synapses as I mentioned, and these neurons have a very complicated kind of
dendritic morphology here, and they have different
kind of functional properties in the different areas. So in this green area near
the center of the cell, the soma, is where most of
the feed-forward inputs go. This acts like your typical neuron
that you’re used to. It’s a weighted sum
plus a non-linearity. These inputs tend to drive the cell and it’s sort of the
classic point neuron. But the amazing thing
is, this is actually only 10 percent of
the synapses on the cell, and 90 percent of the synapses are in these other distal areas,
these blue areas. As I mentioned earlier,
in these areas, as few as 8 to 20 clustered synapses
can detect a pattern, and they generate what’s known as a dendritic spike and NMDA spike. This spike travels to
the center of the cell, but it does not cause
the cell to fire. So you get this event, this recognition event that
seems to have no impact on the cell in terms of firing rate. But if you look inside the cell, it turns out it does actually prime the cell to fire more
strongly in the future. These neurons can detect hundreds of these independent sparse patterns all throughout the dendritic tree. They’re all completely independent. So for a long time, it was
really puzzling of what is what’s the point of
all these synapses, if it doesn’t have
any direct impact like this. So we think is going on is
that these dendritic areas, they’re playing different functions. So you have the feed-forward pattern. This defines the basic pattern
that the cell is recognizing. What’s going on here is
that the synapses in this lower distal areas are detecting sparse local patterns of activity of nearby neurons and these are
acting as contextual predictions. So when some of
these patterns are detected, it’s going to then prime the cell to be firing more strongly
in the future. Then the synapses up on the top
are getting top-down inputs, mostly from regions above. These are also detecting
sparse patterns, and they invoke
top-down expectations. So this is a slightly different
kind of prediction, but it’s also a type of prediction. So what’s going on is that
you have this neuron that’s trying to predict its own activity
in lots of different contexts. Then if you look at
the learning roles in here that people have discovered, there is really three
very basic learning rules outside of this green area. The basic things here are, if a cell becomes active, if it fires for some reason, if there was a prediction
so in the past,>>If it fires because
of the green input.>>If it fires because
of the green input, if there happened to be
a prediction in the past, so some dendritic segment
caused a prediction here, we’re going to
reinforce that segment. So we’re going to reinforce only the synapses in that segment
and not in the rest of the cell. If there was no prediction, but the cell still fires, which going to start
growing connections to some random segment on the cell. These connections will sub-sample
from the input that’s coming in. It’s going to be a sparse sampling. If, however, the cell was not active, if there was a prediction, that means it was
an incorrect prediction, and we’re going to slightly weaken the segment that caused
that prediction. So three very simple learning rules. Here, just to point out, learning consists of growing new connections, and each learning event
is like creating a sub-sample consisting
of a very sparse vector. Each neuron can be associated with hundreds of these sparse
contextual patterns. Essentially, each neuron
is constantly trying to make predictions and
learn from his mistakes. Notice that there’s
no supervision here. There’s no batch notion here. These learning rules are
constantly occurring. So everything is
continuously learning. It turns out that because these
vectors are sparse, and remember, if you’re in the right regime, they’re going to be
really far apart in the space and not interfere
with one another. So as long as these vectors
are really sparse, there is no interference and
you can keep learning things without corrupting previous values, or other learn things. So this is another benefit of
these highly sparse representations. So this is a very simple
learning scenario here. So it turns out you can build a network of
these neurons we’ve done, and you can get a very powerful
predictive learning algorithm. I’m not going to walk you
through the details of the algorithm but essentially
you have groups of cells. Each cell is associating some past activity as contexts
for its current activity. Just one time step in the past. It learns continuously, and it
turns out it generally does not forget any of the past patterns because of the
sparse representations. These networks can actually
run really complex, high markup order sequences, which means you can impact the current state based on input that happened
many time steps in the past, even though the learning
rule is Markovian, it’s only looking at
the previous states. So there’s a dynamic
programming aspect to it. Then everything is sparse. So not only can you learn
continuously without forgetting, but it’s also extremely fault
tolerant, these networks. So I’ll just show one simple result, and this was published
a couple of years ago. This network works really well
with streaming data sources. So here’s a case of
New York City taxi demand. So this is a dataset that’s released by The New York City
Metropolitan Authority. You see a typical
weekly pattern here, there is seven bumps here, and the basic task is to predict
taxi demand in the future. If you look at the error
of prediction here, we’ve tested our Network which is these HTM Networks which stands for
Hierarchical Temporal Memory. Its the name of our algorithm. We’ve compared it against
a bunch of other techniques, and the error rate of the HTM is approximately the same
as the best LSTM Network. So they are about
the same error rate. However, what’s interesting is what happens when the
statistics change. The HTM networks, because they
are continuously learning, they adapt very rapidly to
changes in the statistics. So, this is error rate over time. Here’s a case where the statistics
of the sequence is changed. You can see that the error for both HTM and LSTM
goes up pretty high. But then the HTM error rapidly
drops back to the baseline rate, whereas the LSTM takes quite a
long time before it drops back. This is true even if you
keep retraining the LSTM, and you can play with the
retraining window and all of that, and it doesn’t matter. That’s because LSTMs are
fundamentally batch systems and there’s no notion of
recency in the samples, and it takes a long time before the change statistics are significant percentage
of the overall dataset. If you don’t train on enough, then the error rate is just
high all over the place. So, the continuous
learning rule that I described is perfect for adapting really quickly
to changing statistics. So just as a summary again here, the way that neurons
operate and the way these dendritic segments
operate leads to a very simple, continuous, unsupervised
learning rule that can learn continuously without
forgetting previous patterns. Okay. So I’ve talked about
robustness and continuous learning. We have a long road map of
things to do as Jeff alluded to. Within robustness, we’ve only tried on relatively
small problems here. So, I’d like to try it
on much larger problems. It’d be really
interesting to test with adversarial systems as well, to see whether the
sparse networks can actually hold up against many of
the adversarial attacks. With continuous
learning, that has not been integrated into
a deep learning system. The algorithm I showed you
were just one layer systems. So I think it can be
integrated in and we can keep the same philosophies in
there as we integrate in, and we can implement these predictive learning rules
within deep learning systems. I think that may help enable continuous learning
and unsupervised learning, in a very rich way in
deep learning systems. Then beyond that, there’s
the full 1000 brains idea. Jeff talked about
the voting mechanisms, and voting across sensory modalities
and across regions will add some really interesting
robustness properties as well as other properties. We want to move to a case
where there’s many, many small models across
sensory modalities that are hypothesizing what
their sensory inputs are detecting, and then voting to
resolve ambiguity there. Then, I think it’s critical to move to scenarios where at every layer
of a deep learning system, you have inherent object
specific reference frames. This is going to allow much faster learning and
much better generalization, because you’ll be much more
invariant to changes in perspective, for example, if you can represent things in
their own reference frames. Okay. One of the reasons we’re here is to see if there’s
any opportunities for collaboration, and we’d love to discuss with
any of you if you’re interested in any of these ideas and see
how we can apply them together. Here’s some ideas for
possible projects. In the range of applications, I mentioned this idea of testing robustness and adversarial systems. This is not something we
have a lot of expertise in, it would be great to work with
someone who is looking into that and the tons of
security implications here as well. I think we could test
with different domains, such as robotics, natural language
processing, Internet of Things. As we incorporate these things
as differentiable systems, they can be applied to just about any deep learning
architecture and paradigm, including of course
recurring networks and reinforcement learning and so on. So it’d be great to work with
people who have expertise in that, and really see how far we
can scale these things. Then specifically, in
terms of larger problems, mNest and Google’s
speech commands are still relatively toy problems. We want to attack
much larger problems. As a small lab, it’s really hard for us to do image net style stuff, so we’d love to work
with anyone who wants to scale these networks. There’s a lot of really
interesting things that can be done in terms of acceleration
and power efficiency. I didn’t really talk about the power advantages of
sparse representations, but they’re pretty dramatic in here. Unfortunately sparse computations are not very well suited to GPUs, and they’re really well-suited
to FPGAs and other computations. So I think accelerating
these things are going to be quite challenging and interesting. Okay. So hopefully, some
of you guys are interested in these things and can talk to us. Then there’s a picture
of our research team and our contact info
here. So thank you.>>So I don’t know how
you want to handle it now. What time is it by the way?>>It’s about quarter to 12.>>So we went over our hour.>>It’s only 11:46, so.>>We’re here so.>>If people just want to hang out
and ask those questions, great.>>There’s actually
one remote question.>>Sure.>>This is at Jeff. Learning
also happens when we see and watch images and movies
about the world on a screen. Why does it have to infuse
the corporate sensor motor system. Is it understood how
the brain is switching between the physical world
and these projections?>>I think the question, if I
understand it correctly is, when we look at a flat screen
even with maybe just one eye, so we don’t even have
stereotopic vision at all, how is it that we rebuild this
depth model of the world and so on? That’s a great question. We have some clues as
the answers to that. I talked about those grid cells
as representing location. It looks like, not only are they driven by your
internal motor commands, but they can be driven by
essential clues, such as flow. By the way, I use this example a lot. Imagine you’re watching
someone play third-person’s, first-person’s shooting game,
I never play these games. But you’re going through this maze
and you’re running around. You’re following this thing and
you know where that player is, and you know where
they are on the map. So all that’s happening, the whole grid cell thing and
all those locations things are happening even though there
isn’t this depth thing. So it doesn’t require that you have
some three-dimensional camera. It doesn’t require
stereotopic vision. What it requires is that there
are sensory clues such as flow, and you look in the neocortex, these things are highly
represented in the cortex. So, various types of
vector fields that are occurring are driving the system even though there isn’t
a three-dimensional sensor. It’s an interesting question, why I’m looking out at you right now, why do you appear there and
not on my retina, right? My perception of you are there even though you’re
actually on my retina. So it’s the same problem
really in some sense. How do I know you’re
there and not here? It’s because of
these sensory clues that are really extracted early on. If you look at
all the sensory streams, both tactile and vision, they have this sense of motion
and flow built into them. So you have tactile sensors which detect movement in your vision
and your retina as well. So I think that’s a general answer
to that question, but maybe not really
super detailed answer.>>People who went to the cinema like
[inaudible] and some of the wear.>>Now the train is going to
come out. Yeah, they went, “Oh my God, the train
is going to hit us.” It’s a fascinating thing.
I just tried to expand it. Just realize how crazy it is that you perceive
everything at a distance, all the time. How is that? I mean, in hindsight it’s so obvious that everything has
a location representation. This was not obvious three years ago. We just, “Oh, I know what’s
out there. Yeah, sure. How? But, how do you
know what’s out there? What kind of neurons
are representing?” Anyway, it’s a great question, and we don’t really have
all the answers to it but that’s the basic gist of the answer. Yeah.>>When you go to language which is, I guess, less physical?>>Yeah.>>What role do
the columns play there?>>So, the questions about
language and it’s less physical. How do we understand
this theory in terms of that? We don’t really understand, but I’ll give a couple of clues here. First of all, language
consists of words, whether they’re written words, whether they’re spoken words, whether the sign language. Those are all physical objects
that you can model. So, you’ve got columns in your auditory cortex are building
auditory models of words. You’ve got columns in
your visual cortex that are really being visual models in the world and they can vote. So, that’s why I can
hear a partial thing and see something maybe watch
and I can put these together. So, we start off with atoms of language that are
really physical objects, but then how do they get
their conceptual nature? We don’t know the answer
that question yet, but there’s some very
interesting things that happened recently
in neuroscience. We derived this theory
based on the idea that how is it that I touch
objects or see objects? There is a whole another set
of resources is just come on in
the last couple of years, where they’re studying humans
using fMRI and they’ve shown that even when you’re thinking about conceptual
objects in your head, you’re imagining various things that there’s evidence that there
are grid cells underlying it. So, they’ve discovered this using
very clever imaging techniques, where you can sit in an fMRI machine, I had the slide that came up or
like it shows you about birds. So, no one really
understands this yet, but the evidence is very very clear
that this is what’s going on. So, how do you map conceptual objects like words
into this location space? We don’t really know. Yeah. I mentioned
the issues of recursion, which is a big part
of what languages. If you read about
[inaudible] and so on. So, all these things are
triangulating saying, “That is what’s going on. We just don’t understand
it yet completely, but we do know it’s going to
be based on location frames, it’s going to be based on
recruiting location frames. We have all this evidence
which triangulating on that.” That’s just a fascinating
thing to think about.>>Of course, the circuitry
of the brain that is responsible for language looks
identical to the visual artist. So, it has to be the same function.>>So, to me, the big hurdle we overcame was just understanding how this reference frame concept
applies throughout everything. Now, it’s more turning
the crank and going down in these different pieces
and explaining how it is that we do
all these different components, and how do we put together
a broader theory of concepts and language
and abstract concepts? We don’t really know yet, but it’s all in front of us.
All the pieces are there. So, you just have to think about them correctly.
That’s my thinking.>>So, is there any part
of HTM that talks about different brain regions and
why that structure exists?>>When you say
different brain regions, outside of the neurocortex? Or like the different parts?>>[inaudible]>>Well, within the cortex, we don’t explicitly
and you state that, but essentially, I
alluded to it earlier. A column has some input and
it’s going to model that input. It’s going to essentially model, build a sensory motor
model that input. That input can’t be very large
and just to be fairly small. It can’t have a very super high
dimensional input to that thing. So, what you see is that
topology in the brain, is that some part of the retina
projections or a single column, another part project another project
a single column, and so on. So, if you could build
a visual system with one column, and it would be like
looking through a straw, and that’ll be
a complete visual system, and it would learn by moving
the store around and like it would be like moving this little window around and looking at
stuff and that would work. It’s pretty straight-forward
just to expand out, to have a whole bunch of those
working at the same time. So, I don’t think we haven’t
modeled that per se. Our focus has been on really, what does the column do? With the belief that once you
understand what the column does, the rest of it becomes pretty easy. It was really the tricky part
is what the column does. So, we were not able to
scale up to that suffering, we abandon all that awhile ago. We just said, “Let’s just
focus on one column, let’s just look at one column. We’ve just got to nail
that one column.” So, all of our simulations
have been very small. That means, more than hundreds or thousands or tens of thousands
of neurons all contained in a single column and to scale up
to human brain science stuff, A, it’s not theoretically
important at this point in time, but also we don’t really
have the ability to do that. But I think it’ll be
fairly simple to do.>>Okay. On the flip side, do you have any notion
of mini columns?>>Yeah, we do. I didn’t
talk about mini columns. For those who don’t know, a mini column is a structure. These are physical structures. You can see them in the neurocortex, are somewhere between
30 and 80 microns wide. That’s really small.
There’s several hundreds of them in a cortical column. Just one point to note, that mini columns are
only visible in primates. So, people who study
rats don’t see them. That doesn’t mean they don’t exist. You just can’t see them. So, you can have the function equivalent can be there,
but they don’t see them. The network that Super Tye
mentioned earlier, and he said he wasn’t
going to talk about it, which is the one that sort of learn sequences and built on
this neuron model we have. In fact, all the networks we talk
about that I mentioned here, how all this stuff works,
It’s built on many columns. Whoops, what was that?
That was my phone. I can tell you briefly what we think. There’s a whole bunch
of parts of this, but a mini column is essentially
one of the ways we use it. Is that all the cells
in the mini column have the same basic feed-forward
response properties. So, if I find a V1 neuron
that responds to an edge, this is known neuroscience. All those cells have the same sort
of visual response property. But in contexts of real-world animal moving about
in observing real things, it becomes very sparse. So, at any point in time, only one of those cells
becomes active. So, it’s a way of taking an input
and sparsifying it in context. So, if I had no context, all the cells become active and I basically say I have started
some set of features. In context, you say, “I’ve unique representation
of that input.” It’s the same input, but it’s
very very unique representation. All of this is in the 2016 paper. All detailed in gory detail
on how this works. Yeah. Right in the moment, I’m expanding the concept of mini columns because
I actually think, I know I’m not certain of this yet. I talked about
these grid cell modules. I’m working on the idea
that the grid cell modules may actually be one per mini
column. They’re very small. So, each mini column could be representing on only a feature
but it could also be representing a part of
the location space. So, it’s integral to the structure. None of this has to be done this way in an artificial
neural network. You don’t have to have
these physical structures. You can arrange things
anyway you want. But, in the brain, it looks like
we do have roles in many columns.>>So do you think the grid cells could provide some of the context?>>Yes. Exactly what they
are. This was the clue. So, there’s two ways we did
that, with the secrets memory. Remember, we were talking
about the taxi data? The context there was the
previous state of the same cells. It’s like, “Where am
I in this sequence? This previous state tells me what
to predict in my next state.” All we did, to get this whole
sensory motor inference system working is add another input,
which is the grid cells. So, the grid cells
are basically saying, “You can look at your previous state, but you can also look
at the location.” So, which one of
those works better for you?>>It’s in such a spatial integral?>>Yeah. Spatial and temporal. And it learns on its own. That layer full cell, we believe will actually
learns on its own. Which are the proper contexts
in which to make a prediction?>>Yeah.>>So you said you had
a very small team and no wet lab.>>No wet lab. We have a lab.>>Several points that
we thought that we had this idea and it
proves to be correct. So, can you talk a little bit about the process of how going
about showing these->>Well. Okay. So I
talked about it briefly, but I’m just going through it again. There’s two basic ways. One, is verifying via empirical data, and the other’s via simulation. So the empirical data, the first thing we do is we go
through existing literature. There are just incredible amounts of existing literature
that nobody knows about, just because we all forgot about it. So we say, “Hey, we have this idea that dendrite
ought to do this.” Can we find it?
Someone find that one. Can we find falsification for it? We very often can find
falsification for our theories and we go
back to square one. We don’t accept anything which
is not biologically accurate. Then we go and talk to
the experimental labs. And we say, “Here’s what we think, what do you guys think about this?” Sometimes, they can find data
that they haven’t published. They’ll say, “You know what, yeah, we have this data but we didn’t think about it that way, but
let’s go look at it.” Or they published it and
it has been very fruitful. So we have collaborations with labs. Some people want to test
our data with new experiments, but that takes forever. A typical IRAD experiment could take two years from start to finish. And so some of that’s going on, but we can’t wait for it. There’s another thing
I’ll point out that there’s something I like to use, which some people just think
they shouldn’t do that. But the number of constraints
you’re satisfying at any point in time where the theory is an indication how good it is. So if there are 25 constraints like these
are biological strengths, we know the neurons,
work like this and then some says do this, and
this, and this, and this. At first, it makes your theories
much, much harder. How do I satisfy all these
constraints simultaneously? It’s just almost impossible, but when you actually get an answer which satisfy
many constraints, you’re almost certain it’s right. And that has proven itself through
over and over and over again. It’s not proof, but we work on it. And if you loosen up
your constraints, let me just give
some biological inspiration. You can come up with anything. If you really want to
satisfy the real biology, it is really hard. It is not easy to come up
with solutions. Trust me. And so it takes us a long
time to come up with solutions to these problems,
sometimes years. Yeah.>>My question is more
of this [inaudible]. So obviously what you are trying to do is kind of translate some of those insights into some
applications [inaudible] , right?>>Yeah.>>So one thing that kind of stands out in this particular
talk is for example, the column or mini column, whatever, is extremely
important, right? But isn’t it time to kind
of upgrade the model of the neural lab like we are still
dealing with individual neurons. And now, we learned that
those units kind of very aggregate units are extremely important as units
in their own rights.>>Yeah.>>And I don’t think I have
ever seen a neural net model built on the notion of a column
or something like that. Did you ever tried it?>>First of all, a 100
percent agree with you. I think we have to move beyond just layers of simple point neurons. And it’s sort of what
I alluded to here. For me, the first step was really making sure we
have sparsity handled. And now, I’m focusing on
integrating the neuron model. So not just point neuron, but the dendritic
structure in there and then a larger scale structures like the mini column
and cortical column. And that’s going to be
necessary for including object-centric reference
frames in including motor input and having a true
sensory motor predictive system. And I think this is going to really pay a huge dividends down the road.>>I mean we think
that’s the future of AI. So I’ll just be really clear. We think this architecture
we’ve talked about here, if you’re alive out
20 years from now, this is what people are
going to be building. How do you get there? How do you move people
in that direction? And these are complex problems,
things they implement. So we’re thinking through that
process. So how do we get there? How do we start with
one step at a time? If you going to think
about what Geoff Hinton did recently with his capsules. He had this intuition that some sort of location and relative location stuff
had to be important, but he didn’t have that sort of the deep neuroscience
knowledge that we have. And so we have
a much richer idea what’s going on there than he does. But it’s the same basic intuition that we need to go into
the different sort of representational framework
and that incorporates much more sophistication in
each column if you will. Right now, the traditional
artificial neural networks are really very simple. It’s almost like
one neuron [inaudible]. And that’s why you have
to have a 100 levels of kind of limited problem.>>Yeah. Maybe I’ll point out
convolutional neural networks were originally inspired by
the biology a little bit. And so, you have your feet filters, feature detectors, followed
by a pooling step. Well, that corresponds
in this kind of diagram to input coming
into layer four, going up to layer two, three, and then, up to the next level. If you’ll count
the number of synapses in a cortical column that
match that model, it’s less than one percent. It doesn’t match 99 percent
of what’s going on in our brain if you look at
the individual connections, so all of this other complexity
has to be incorporated.>>These two light
without the blue part, just this input here, and input to there and out.>>That’s basically
a convolutional neural lab. And so, you have to
incorporate all of this structure eventually to get what we think in order to
get truly intelligent systems.>>Have you seen
anyone who link that?>>No. Well, capsules is the
closest that I’ve seen. And it does incorporate some
of these intuitions in there, but by and large now, I think most people are stuck with.>>So you should know
that this theory we present here was both on
Insight three years ago. The first publications
were a year ago. We just published of this omnibus
paper about this in December. I presented it to
a large 700 Neuroscientists in October in Europe that
the human banks on it. This is all very new. The people I’ve been thinking about that this idea there’s
a location stuff. You won’t find it anywhere in
the neuroscience literature other than a draft. So we’re just starting on this path. And one of the reasons we’re here
is to see how quickly we can get people who work
with us on this and adopt them and how people
react it with the eyes. What do you think of
this? Do you buy it? Do you believe it? Does it bother you that your system
don’t work like this now?>>This is a tremendous
research roadmap for machine intelligence in general. And here, there’s
so many rich ideas in here. And we know this is how
the brain works now. We have strong confidence
in that and we can point to which of these structures, the exact benefits its
going to have down the road for practical systems. So things do have to move this way.>>Is that omnibus [inaudible]?>>It’s the 2018 one.>>Yeah. If you go to numenta.com, there’s a paper sections, so they’re all kind of listed there.>>That was in Frontiers
in December came up.>>[inaudible] for lunch with Eric.>>Okay.>>Because Eric will have
to leave soon [inaudible]. Unfortunately, we will have
to cut this a little bit.>>You got our e-mails,
address us then.>>So please free to reach
out and [inaudible].>>Yeah, yeah.

34 Replies to “The Thousand Brains Theory”

  1. How about some working code that people can play , expand , learn with say for a game engine , simulator, or classification / segmentation….

  2. The first part of the presentation is like Hinton’s Capsule Networks except for lack of mathematical model and rigorous thinking.

  3. Remarkably he retained many ideas since writing "on intelligence" (2004). No word of Hierarchical Temporal Memory anymore but same focus on cortical columns and the idea of motion being integral part of perception. Also same propensity to theorize orders of magnitude beyond what current experimental results suggest. Big fan.

  4. Fascinating and very promising biological-oriented approach to overcome current deep learning limits. Our neocortex is an invaluable source of inspiration to build AI models with strong noise robustness and continuous learning by design.
    I'm eager to see the next steps involving "voting" between cortical columns and cortical-subcortical interactions (thalamus, basal ganglia, hippocampus, …)

  5. Great video, but he's missing something huge.. there are radio frequency connections too.. nothing physical, just parts of the brains neurons communicating remotely with each other wirelessly. A team of researchers studying the brain have discovered a brand new and previously unidentified form of “wireless” neural communications that self-propagates across brain tissue and is capable of leaping from neurons in one part of the brain to another, even if the connection between them has been severed.

    The discovery by biomedical engineering researchers at Case Western Reserve University in Cleveland, Ohio could prove key to understanding the activity surrounding neural communication, as well as specific processes and disorders in the central nervous system.

  6. It may be that AI systems will be nothing like the human brain in the same way that jet planes are nothing like birds.

  7. Subatai's presentation was so much more useful to me as an ML researcher. Great ideas and hope to incorporate them in the future.
    Also the slides really help understanding: https://www.microsoft.com/en-us/research/uploads/prod/2019/03/42804_The_Thousand_Brains_Theory.pdf

  8. This is an very interesting idea. I feel there are a lot of follow up questions:
    1. How is information integrated across the different columns? Is voting sufficient for this? What is the purpose of the voting, is it to decide which motor outputs to take? It blow my mind to think that we basically consist of lot of small computers, which all co-ordinate a set of motor outputs, but it does certainly seem both feasible and a somewhat novel idea.
    2. How does memory work? The hippocampus is critical for the formation of long term memory, and is not part of the neocortex. Memory is critical for our sense of self, and thus you could argue consciousness (although I think people place too much importance on the idea consciousness in cognition, but I digress).

  9. Maybe consensus is the part of the secret sauce behind being aware or "conscious". This is something that wouldn't be just invented by people without some inspiration from biology so understanding more about our brain cannot hurt.

  10. Please show the presentation when the speaker is explaining its content. I don't need to see the speaker face when his is looking and pointing at the slides, in case you don't already know.

  11. 1:00:41 my understanding is that the central terminating synapses are stimulant, and the dendritic synapses inhibitory. Really interesting to hear the way you described this part! Your words were that they prime the cell to fire, this makes a lot of sense. Calcium bridges at the synapse are then formed in that rare case a dendritic action potential leads to a main axial firing.

  12. Amazing content, but the production is an embarrassment. For such a spatial topic why isn't the camera focused on the presentation. Microsoft Research you flunked this video, redo the video with Sight and Sound included, thank you.

  13. If people can know what will happen in the future is a foolish! Time & time again every thing has it's own time! & THIS IS ALL VANITY!!! all things must past! So let us not be foolish !!Thank you

  14. It occurs to me that towards the end of puberty there is a massive culling of neuronal connections that leads from the adolescent brain to the adult brain.

    That culling makes things more sparse, and given the emphasis on the importance on sparseness in this video, I wonder if it has a similar relevance in improving the brain overall by culling connections.

    Also does any of this shed light on sleep? Is sleep just random free cycling within the brain or does it serve some purpose that can now be more precisely defined than the usual assumption of reinforcing wanted connections from the preceding wake period and culling the unwanted ones?

  15. If a cortical column has most of its connections internal, with far fewer coming from outside, that has advantageous implications for building electronic equivalents.

  16. Would have been nice to see more of the presentation and less of the presenter. Micro$oft must have had a middle school intern as their media guy this day.

  17. The vibe in this room is killer. Maybe because the speakers are a little nerveous, but it makes me uncomfortable.

  18. Why is it that in all these talks the F..KEN moron camera guy always points the camera at speaker and hardly at the most important thing, the slides. 99% of the time camera is on the speaker. Go get a brain camera guy

  19. Jeff Hawkins is one of the most brilliant minds in neuroscience and AGI research. He's light years ahead of the deep learning crowd. He's mistaken about one thing though, which is the notion that the brain creates a complex model of the world. There are neither enough neurons nor energy in the brain to maintain such a model. If we had a model of the world in memory, we would have no trouble navigating in familiar places with our eyes closed. Fact is, we need to keep our eyes open. There's no need for a complex model: the world is its own model and we can perceive it directly. We do retain a relatively small number of high-level bits of things we have experienced but only if they are recalled repeatedly. Low level details are either forgotten within seconds or written over by new experiences.

    Unlike deep neural nets, the brain can instantly perceive any complex object or pattern even if it has never seen it before, i.e., without a prior representation in memory. Perception without representations is the biggest pressing problem in neuroscience, in my opinion. Good luck in your research.

  20. I posted this earlier on the Lex Fridman podcast with Jeff Hawkins:
    How much of the brain is redundancy? I ask this because Hawkins talks about robustness, and there is the apparent benefit that ANNs don't need this really. Even if ANNs can't ever be as dense or efficient as brains, they may be able to compensate because of this.
    It appears that current neurons in ANNs are actually just some vague representation of neuronal connectivity and/or action potential. Furthermore, current ANNs seem to just be complicated versions of a SINGLE neuron or several neurons in series, not any real representation of a brain or even a neuronal cluster.
    Firstly, ANNs need to be able to construct themselves relative to the complexity of the problem at hand. They should be able to create layers and nodes as new data is introduced and new patterns are discovered.
    Also, layers need to be more dynamic, as does connectivity among nodes. In recent years, too much effort has been put into making ANNs deeper, which is like stringing neurons in series. Yes, this allows for something approaching "memory" but neglects constructing a more natural form of pattern recognition through weights existing across connections and not the nodes themselves. As Hawkins mentions, sparse connectivity is the goal if we are going to try and mimic the brain, and this can only be done if layers aren't treated as some fixed block.
    Currently, there are only heuristics about how many layers and nodes should be involved, and this can't be right. Being able to construct an ANN relative to the problem at hand is another apparent potential advantage of ANNs. You could potentially have ANNs with a size or complexity that is accurately proportional to the question, as scaling or uniting these for something approaching AGI would take fewer resources.

  21. finally next stage… Its happening … been following the development of Jeff and Numenta since On intelligence. 15 years cant believe it!

Leave a Reply

Your email address will not be published. Required fields are marked *