How Neural Networks Process Information

Neural networks process information in a very different way from standard computers based on the Von Neumann architecture. Whereas Von Neumann machines rely on discrete sequential processing, neural networks are highly parallel and continuous in nature. These differences are important both in terms of designing useful devices and because they seem to provide a closer match to the way that people operate. There are four properties of neural networks that we will be demonstrating. These are:
  1. Content addressability - the ability to access memories given any of the components of the fact or episode.
  2. Robustness to noise - the ability to access memories despite incomplete or even incorrect cues.
  3. Generalization - the ability to generalise over a set of instances.
  4. Default assignment - the ability to assign plausible default values if a given fact is not in memory.

Content Addressability

In a Von Neumann architecture data is stored at specific locations or addresses in the computer's memory. In order to retrieve that data it's unique address must be known. Similarly, in relational databases information is stored in rows of a table and certain fields of each row are designated as the "keys" for that record. The data in these fields is usually unique and the database is optimised for queries that use these fields to retrieve the row.

Contrast this with human memory. We can be reminded of facts and episodes from what are often quite obscure cues which are often not unique when taken in isolation. We are able to locate records (or memories) based on any of the information stored in that record. For instance, if I say to a friend, "I like your blue shirt with the stripes.", they will often know exactly which one I'm talking about, despite the fact that I have not provided a unique identifier for that shirt. Furthermore, the cues given, that is, blue and striped, may only specify the shirt uniquely when taken in combination (if your friend has other blue shirts and other striped shirts).

Of course, it is possible with the appropriate query to retrieve the same information from a computer database. In human memory, however, retrieval from the content of a memory is automatic - human memory is fundamentally content addressable.


Figure 9: Content Addressability.
Exercise 19: To illustrate content addressability in the Jets and Sharks network we can activate the Jet, and 30's units and run 20 cycles. These characteristics while not unique individually do specify a unique gang member when taken together. Who is it?

Robustness to Noise

Von Neumann architectures are discrete in nature. This discrete nature allows them to retain information completely faithfully when subjected to small amounts of noise. Provided the noise is not sufficient to switch a bit from a one to a zero or vice versa the information will be interpreted as intended. This is, of course, the secret behind the improved performance of digital recording. For a great many applications this is a very useful property. I would prefer that the bank retained the exact balance of my account, not an approximation or best guess.

In contrast, neural networks use redundancy in their structure to provide a best guess of the information to be retrieved. Such an approach is very useful in situations of extreme noise (such as speech recognition) where the information is incomplete or even incorrect. The next exercise demonstrates how the IAC network is resistant to erroneous information.


Figure 10: Robustness to Noise.
Exercise 20: In the network in figure 10 the Shark, 40's, H.S., Burglar and Divorced units are active. These are the characteristics of Rick except that Rick is actually in his 30s not his 40s. (see table 1). Now run 20 cycles. Which name unit comes on?

Exercise 21: Run another 40 cycles (for a total of 60 cycles). What happens to the age units?

Generalization

One operation that people seem to be very good at is collapsing over a set of instances to establish a general trend. For instance, we might ask "Are Americans more extroverted than Australians?". Unless you have read the studies claiming that in fact they are, then your only recourse would be to collapse across the set of Americans and the set Australians you know and to extract some form of central tendency measure on the extrovert/introvert dimension. This is quite a difficult computation, but one that people perform routinely. The IAC network can accomplish spontaneous generalisation of this kind by activating a property and cycling.

Figure 11: Generalization.
Exercise 22: To ask the question "What are Single gang members like?" reset the network and toggle the Single unit on. Run 40 cycles. Which characteristics become active?

Exercise 23: The Art and Ralph instance units become active but not the Sam instance unit, despite the fact that Sam is Single also. Why is this?

Exercise 24: Why does the 40s unit become active?

Default Assignment

The final property that we will examine is the ability of the IAC network to provide plausible default values if it does not "know" a given piece of information. The human memory system makes extensive use of plausible defaults. In fact, people can have difficulty distinguishing actual memories from those that have been reconstructed from other related information. In the IAC network the provision of plausible defaults is closely related to generalisation. Items which are similar to the target item are used to extrapolate what the missing information should be. In the following exercise we will remove some of the weights in the Jets and Sharks network and see if it can provide reasonable answers.

Figure 12: Default Assignment.
Exercise 25: Firstly, run 30 cycles and note the properties of Ralph. Now remove the weights between the Ralph instance unit and the 30s unit and the Ralph instance unit and the Single unit (remove the weights both to and from the units - that is 4 weights in all). Reset the network, toggle the Ralph instance unit on and run 120 cycles. How successful was the network at guessing Ralph's age and marital status properties? Explain the results.

[Return to the start of the tutorial]

References

McClelland, J. L. (1981). Retrieving general and specific information from stored knowledge of specifics. Proceedings of the Third Annual Meeting of the Cognitive Science Society, 170-172.

McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.

McClelland, J. L. & Rumelhart, D. E. (Eds.). (1988). Explorations in parallel distributed processing: A handbook of models, programs, and exercises. Cambridge, MA: MIT Press.

Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94.

Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.

Endnotes

[1] In the McClelland and Rumelhart formulation, negative activations coming from other units are set to zero before entering into the net input calculation. For simplicity's sake we have not included this thresholding process. This restriction has little impact on the major points we are trying to address.

[2] In the McClelland and Rumelhart version an activation which falls outside of the boundaries it is set either to max or min which ever is closest. This prevents the activations from becoming very large or very small quickly which can occur if parameter values are large. We have not included this component so that the natural tendency of the activation to keep activations within bounds can be observed. Be warned, though. If your activations are growing very large you may need to decrease some of the parameter values.


The BrainWave homepage.