By Gordon Rugg
There have been a lot of major changes in cognitive psychology over the last thirty-odd years. One of the biggest involves the growth of connectionist approaches, which occur at the overlap between neurophysiology and Artificial Intelligence (AI), particularly Artificial Neural Networks (ANNs).
Research in these areas has brought about a much clearer understanding of the mechanisms by which the brain operates. Many of those mechanisms are profoundly counter-intuitive, and tend to be either misunderstood or completely ignored by novices, which is why I’m writing about them now, in an attempt to clarify some key points.
There are plenty of readily available texts describing how connectionist approaches work, usually involving graph theory diagrams showing weighted connections. In my experience, novices tend to find these explanations hard to follow, so in this article, I’ll use a simple but fairly solid analogy to show the underlying principles of connectionism, and of how the brain can handle tasks without that handling being located at any single point in the brain.
The old view was that the brain is physically divided into different parts (which is true) and that each of these parts does something different from the others (which is sort-of true). This looked consistent with evidence from what happens in brain trauma, where injury to one particular region of the brain typically causes damage to the victim’s language abilities, for instance.
The full story is more complex.
One obvious but profoundly misleading analogy is to view the brain as being like an office, as in the old illustration below.
It’s a wonderful example of its type, featuring roles such as “Superintendent of head movements” and a classic set of gender stereotypes.
It’s also an excellent example of infinite regress and of dodging a key question. This model doesn’t say anything about how the different departments of the brain make their decisions; instead, it shows homunculi, miniature humans, who presumably themselves have brains organised like an office, staffed by miniature homunculi, and so on ad infinitum.
So what would be a better analogy?
Most modern introductions to connectionism use graph-theoretic illustrations like the one below.
This is an accurate representation of how an Artificial Neural Network handles input, but most people find it hard to follow.
In this article, I’ll use an extended analogy.
The analogy in this article involves a classroom of students, to show how a simple voting system can identify animals by using a similar mechanism to that used by the brain. A key feature of this analogy is that the knowledge of animal identification is distributed across the classroom in such a way that no single student is tasked with identifying any single animal.
I’m focusing primarily on showing how processing can be distributed across different places. I’ve deliberately downplayed issues about how the brain and Artificial Neural Networks actually handle distributed processing, for clarity and brevity – this is already a long article.
In this analogy, we’ll assume that the students are shown photos of animals, and that the photographs are reasonable quality and full colour, for reasons which should become apparent later.
We’ll keep the task small and simple, namely learning to identify a single type of animal. This is comparable to real-world cases where an artificial neural network (ANN) is used to identify a single category, such as a traffic light management system that identified oncoming buses and, where possible, turned the lights to green for them to give them a faster journey.
Voting for zebras
For this analogy, we’ll give the students the task of deciding whether the animal shown in a photo is a zebra. We can make a reasonable start by using just one student.
This student is given two votes, and is given the job of saying whether the animal in each photo:
- definitely has stripes (two votes)
- sort-of has stripes (one vote)
- doesn’t have stripes (no vote)
If the image gets all the possible votes (i.e. two votes), the animal is a zebra. In the image gets no votes, the animal is not a zebra. We’ll look at the situation with intermediate numbers of votes later. So, the rule so far is “If it definitely has stripes, it’s a zebra; if it doesn’t have stripes, it’s not a zebra”.
Let’s suppose that the student sees the set of animal photos below.
For this set of images, the voting system has worked perfectly. The zebra has received two votes; the other animals have received no votes.
Adding complexity, and distributing processing
Now let’s make the task more complicated, by including a photo of a tiger. What happens when the student votes for each photo?
The zebra photo gets two votes. The tiger photo also gets two votes, because it definitely has stripes. However, that photo is not of a zebra, so our system hasn’t been discriminating enough. We need to make the system a bit smarter.
We can do this by adding a second student, who has the job of identifying each animal’s colour as:
- definitely only black and white (two votes)
- sort-of black and white (one vote)
- not only black and white (no votes)
Here’s the voting pattern that we now see for the set of animals that includes a zebra and a tiger.
The zebra gets four votes; the tiger gets two votes; the other animals get no votes. The voting system is now identifying zebras correctly again, and is also correct when it says that an animal is definitely not a zebra. However, the tiger is now in an in-between category. It’s not a definite zebra, but neither is it definitely not a zebra. We’ll return to this point later.
The voting system is working properly again, but it’s now very different from the previous system in a crucial way. In the previous system, the zebra identification was happening in one specific place, i.e. within the head of the one student involved. In the new system, the zebra identification is no longer localised in a single specific place. Instead, the identification requires two students, who are sitting in two different places. It doesn’t particularly matter whether the students are sitting near each other or far away from each other, as long as they’re able to communicate their votes. We now have a distributed system.
The next problem
So what happens if we show this system the photos below?
The second tiger in this photo is definitely black and white, and definitely has stripes, so our voting system would categorise it as a zebra. That’s clearly wrong, so we need to bring in another student. This third student’s job is to look for hooves, with the following voting system:
- definitely has hooves (two votes)
- sort-of has hooves (one vote)
- doesn’t have hooves (no votes)
Here’s how the updated system responds to the same set of images.
With this improvement, the voting system can now correctly distinguish zebras from white tigers. It will also perform pretty well with a wide range of other animals, such as the Belted Galloway cow and the zebra/donkey hybrid in the images below. It’s particularly good at handling in-between categories like the Galloway cow (not truly striped, and with a different type of hooves from the zebra) and the zebra/donkey hybrid (off-white stripes), where it provides a figure for how similar to a zebra these other categories are.
Points to note
This simulation may look very simple, and in some ways it is. However, in other ways it shows deep and unexpected properties.
One key property of this system is that the zebra identification knowledge isn’t being handled by any single student. The identification process is being handled by several students, each doing a single simple sub-task, and with their knowledge being integrated by a simple voting system. The knowledge of what constitutes a zebra isn’t localised in any single student, or in any single location in the classroom; there isn’t a “zebra identification” area of the room or a “tiger identification” location.
This is very similar to the way that the brain doesn’t have knowledge located in a single location in the way implied by the old “brain as an office” diagram. Each piece of knowledge is distributed across multiple locations, and can potentially be re-distributed across a different set of locations after events such as some types of brain trauma.
It’s important to note that this doesn’t contradict the well-established findings of brain injury to particular parts of the brain being associated with damage to particular activities such as speech.
Imagine, for instance, that the students are using a whiteboard to record votes, and that the whiteboard is suddenly damaged. The voting system would now have problems, caused by damage to one specific place, but that place wouldn’t be the “zebra identification place”. Instead, that place is just one critical part of a broader system.
Another type of localised damage could occur if the students involved in this system were sitting near each other for easier communication, and that something happened to the section of the room where they were sitting. If all three students were affected, then the system would fail completely. If one or two students were affected, then the system would be able to work to some extent. This is what happens in many cases of brain trauma.
The partial breakdown described above, and the “not quite a zebra” votes described above, both relate to a concept known in Artificial Intelligence as graceful degradation. When the system meets a case that it can’t handle perfectly, the system doesn’t break down completely and catastrophically, or refuse to do anything; instead, it will give an answer which in essence means: “I don’t know exactly what this case is, but here’s how similar it is to a zebra” (or whatever the system is supposed to identify).
This type of response is usually a lot more useful than the “insufficient data” reply used by computers in vintage science fiction. For instance, it can be used to prioritise cases on a spectrum from “definitely needs urgent attention” to “may need attention some time”. In the real world, this has obvious advantages over a simple “yes/no” response.
So far, we’ve given equal weighting to all of the features used above (stripes, black/white coloration, and hooves). That’s just a simplifying assumption, and we can give some of the features more weighting (i.e. more votes) than others if we want to. Most implementations of Artificial Neural Networks adjust the relative weightings of features so as to fine-tune the network’s performance.
One of the fascinating results from connectionist research is the way that practical issues now make much more sense than they did previously.
Looking inside black boxes; it’s no longer turtles all the way down
A problem with early models of the brain was that the earliest models contained a lot of black boxes, where the mechanisms of how things happened within each black box were unspecified. In reality, the details of a mechanism can have very far-reaching consequences.
One question that might have occurred to some readers involves how the students know that an image contains stripes or hooves. If we wanted to elaborate the analogy, we could handle this by giving each of the three students their own sub-committee of more students, where each sub-committee assesses whether stripes (or whatever) are present, using exactly the same type of task subdivision that the original three students used for the top-level task of zebra identification. So, for instance, the “stripes” sub-committee might have one member whose task was to count colour changes in a horizontal slice near the top of the image, and another member who did the same for a horizontal slice near the middle of the image, and a third who handled the lowest part of the image. A cynical reader might wonder whether this leads to an infinite regress of sub-committees, but the answer is that it doesn’t, for reasons outlined below.
As readers familiar with neurophysiology have probably already guessed, there was a reason for the choice of zebras as the animal to identify in this worked example.
The first identifying feature, namely stripes, is a deliberate reference to the work of Hubel & Wiesel, who won a Nobel Prize in 1981 for their work on the mechanisms that the visual system uses to recognise patterns such as lines and stripes.
This work meant that theoretical models of visual processing no longer consisted of a series of black boxes. Instead, those models bottomed out in clearly identified biological processes.
The second feature used in the worked example, namely colour, is in a similar state. The neurophysiology of colour perception by individual cells is now well understood down to the molecular level.
The third feature, namely identifying hooves, is in an interesting state of transition from black box to being thoroughly understood. A lot is now known about how visual information can be processed to identify specified types of object, such as hooves.
Some of the sub-processes, such as edge recognition algorithms, are well enough understood to be routinely incorporated into widely used software, such as the “remove image background” function in PowerPoint. This function works by identifying where an image might contain an edge between two objects, by assessing how abrupt the change is in e.g. colour or darkness between adjacent pixels in the image.
Other sub-processes, though, are still being unpacked by research. For instance, edge detection algorithms often have trouble handling hidden edges, where part of an object is obscured by another object in the foreground (as in the case of a zebra’s hooves being partially obscured by grass or stones).
Some technical points
Practical issues with ANNs – training sets, cleaning up images, etc
Readers who already know about ANNs will have noticed that I haven’t referred above to issues such as the size of training sets, or supervised versus unsupervised learning. This was deliberate, for simplicity and brevity.
You need large numbers of examples to train an ANN; it’s not as simple as telling it “look for stripes”. It has to learn how to do this. “Large” in this context can mean tens of thousands or hundreds of thousands. The implications for human learning via the connectionist route are clear; you may need to show large numbers of cases to a learner before they grasp the underlying principles. This is precisely what was found by research into humans using this approach in implicit learning.
There are a lot of low-level practical issues that need to be sorted out if you’re using an ANN. One is that you usually have to clean up the data before it goes into the ANN. In the example above, I’ve provided “clean” images by choosing photographs that show the animals fairly clearly, with no complications such as parts of the animal being obscured by rocks or trees.
Connectionist approaches are very good at handling some types of problem, but are not so good at other types. They’re also inscrutable; it’s often difficult or impossible to work out just how they are solving a problem, or, worse, appearing to solve a problem.
There’s a widely-circulated story of an ANN that appeared to be excellent at distinguishing images of Russian tanks from photos of American tanks. What it was actually doing, however, was distinguishing between photos taken in gloomy lighting (the Russian tanks) and photos taken in bright sunshine (the American tanks).
Connectionist approaches, whether electronic or biological, often find solutions that are as simple and effective and as potentially error-prone as the tanks example. This has close links with the literature on human error, particularly with regard to heuristics and biases, where the distinction between “error” and “best guess” can become blurred or meaningless.
At a more formal level, some types of logical association are difficult for ANNs to learn; again, that ties in closely with the literature on error and reasoning. For brevity, I won’t go into those issues here, but the whole issue of differentiating “right” from “wrong” in real-world risk and error management is an important one, which we’ll revisit in later articles.
Fuzzy logic, parallel processing and serial processing
We’ve frequently mentioned fuzzy logic, parallel processing and serial processing in previous articles. These concepts are extremely important for handling real-world problems. Problems that are easy to solve using fuzzy logic and/or parallel processing are usually difficult to solve using serial processing, and vice versa. The human brain is built on a connectionist architecture, which makes it good at parallel processing and fuzzy logic; its handling of serial processing is very much a jury-rigged extension to its original architecture. This is one reason why humans are not very good at logical rational thought, even in cases where logical rational thought is the appropriate mechanism to use.
Implications for education and other purposes
There’s been a lot of recent interest in using cognitive psychology in education theory and policy. That’s sensible and commendable, but it needs to be handled with caution.
Cognitive psychology is complex, and easily misunderstood. Unfortunately, some popular writers are starting to rush in where angels are wary of treading.
Some areas of cognitive psychology, such as memory, look fairly easy to understand. That can give a false sense of confidence. For a solid understanding of cognitive psychology, you also need to have a solid grasp of other, less accessible concepts. As an example of what that involves, here’s the abstract from an arbitrarily chosen article from the 1990s about Artificial Neural Networks. By the standards of the ANN literature, it’s pretty accessible reading.
This paper introduces a hybrid system termed cascade adaptive resonance theory mapping (ARTMAP) that incorporates symbolic knowledge into neural-network learning and recognition. Cascade ARTMAP, a generalization of fuzzy ARTMAP, represents intermediate attributes and rule cascades of rule-based knowledge explicitly and performs multistep inferencing. A rule insertion algorithm translates if-then symbolic rules into cascade ARTMAP architecture. Besides that initializing networks with prior knowledge can improve predictive accuracy and learning efficiency, the inserted symbolic knowledge can be refined and enhanced by the cascade ARTMAP learning algorithm. By preserving symbolic rule form during learning, the rules extracted from cascade ARTMAP can be compared directly with the originally inserted rules. Simulations on an animal identification problem indicate that a priori symbolic knowledge always improves system performance, especially with a small training set. Benchmark study on a DNA promoter recognition problem shows that with the added advantage of fast learning, cascade ARTMAP rule insertion and refinement algorithms produce performance superior to those of other machine learning systems and an alternative hybrid system known as knowledge-based artificial neural network (KBANN). Also, the rules extracted from cascade ARTMAP are more accurate and much cleaner than the NofM rules extracted from KBANN.
Tan, Ah-Hwee (1997). Cascade ARTMAP: integrating neural computation and symbolic knowledge processing. IEEE Transactions on Neural Networks (8)2.
That’s accessible language, by connectionist standards. Specialist technical connectionist language is a lot harder, and so are the concepts that go with the language. The literature on connectionist approaches is huge, and sophisticated, and well-grounded in theory and in practice.
That literature and those approaches have far-reaching implications for a wide range of fields. They also make it clear from that the way the brain operates, and the way that connectionist software operates, is often very different from what we might expect. The brain is in many ways unsettlingly alien.
In particular, research about connectionism demonstrates that the best way of handling complex, uncertain and/or incomplete information is very different from the rigid categorisation favoured by bureaucracies and naïve models of “facts”.
Concepts from cognitive psychology should be included in debates about education practice and policy. However, this area is complex, and the waters have been muddied by well-intentioned amateurs publishing popular texts that advocate education policy based on limited and garbled misunderstandings of the cognitive psychology literature.
If you’re wondering which sources to trust, the usual principles apply. Popular texts by non-specialists are at the bottom of the stack; some may be good, but many will be inaccurate, and some will be grossly misleading. Textbooks are usually okay, but they’re usually constrained by word counts, and have to simplify complex issues as a result. In some situations, the simplification isn’t a problem; in other situations, the simplification is a serious problem. Peer reviewed journal articles are the most trustworthy source of information, but they’re usually difficult for non-specialists to understand.
For non-specialists, the best place to start is usually introductory material written by good specialists. “Introductory material” isn’t necessarily the same thing as textbooks; textbooks are usually intended to be used in conjunction with lectures, so textbooks are often terse and short of relevant detail, on the assumption that the relevant extra information will be supplied in the lecture. (There are a lot of other problems with textbooks, which I’ll discuss in another article.)
How do you decide whether a writer is a “good specialist” rather than a garbling amateur? It’s a good idea to look at the usual indicators of expertise – for instance, whether the author has relevant qualifications, or has published a significant number of peer-reviewed articles in good-quality journals in the relevant area. There are good cognitive psychologists publishing in education theory, and their work needs to be more widely known.
We’ll return to the themes above in more detail in later articles. In the meantime, I hope you’ve found this article useful.
Notes, sources and links
I’m using the Pinterest picture of the brain-as-office under fair use terms, since it’s a low quality image which has already been widely circulated on the Internet, and is being used here in an academic research context.
The other images are listed below in the order in which they appear in this article.