The BackPropagation Network:
Learning by Example

Copyright © Devin McAuley, 1997.
Updated for BrainWave 2.0 by Simon Dennis 1999.
  1. Introduction
  2. Background *
  3. The BackPropagation Algorithm *
  4. Simulations Issues
  5. Making Predictions: The Jets and Sharks Revisited
  6. References
  7. Slides

* These sections contain some mathematics which can be omitted on a first reading if desired.

Introduction

In many real world situations, we are faced with incomplete or noisy data, and it is important to be able to make reasonable predictions about what is missing from the information available. This can be an especially difficult task when there isn't a good theory available to help reconstruct the missing data. It is in such situations that Backpropagation (BackProp) networks may provide some answers.

A BackProp network consists of at least three layers of units: an input layer, at least one intermediate hidden layer, and an output layer (see Figure 1). In contrast to the IAC and Hopfield networks, connection weights in a BackProp network are oneway. Typically, units are connected in a feed-forward fashion with input units fully connected to units in the hidden layer and hidden units fully connected to units in the output layer. When a BackProp network is cycled, an input pattern is propagated forward to the output units through the intervening input-to-hidden and hidden-to-output weights.

We can interpret the output of a BackProp network as a classification decision. In Chapter 3, we explored how the IAC network can be used to encode and retrieve information about members of two rival gangs (the Jets and the Sharks). For this problem, we could have instead used a BackProp network to make gang classification judgments (Jets OR Sharks) based on the characteristics of the gang members. A key difference between the BackProp and IAC approaches is learning. IAC networks have fixed weights, which means that for the Jets and Sharks problem, all of the knowledge about the gang members has to be hard-wired into the network. BackProp networks are not limited in this way because they can adapt their weights to acquire new knowledge. In this chapter, we will explore how BackProp networks learn by example, and can be used to make predictions.

With BackProp networks, learning occurs during a training phase in which each input pattern in a training set is applied to the input units and then propagated forward. The pattern of activation arriving at the output layer is then compared with the correct (associated) output pattern to calculate an error signal. The error signal for each such target output pattern is then backpropagated from the outputs to the inputs in order to appropriately adjust the weights in each layer of the network. After a BackProp network has learned the correct classification for a set of inputs, it can be tested on a second set of inputs to see how well it classifies untrained patterns. Thus, an important consideration in applying BackProp learning is how well the network generalizes.

Figure 1: Classification with a Backpropagation network

The task of the BackProp network shown in Figure 1 is to classify individuals as Jets or Sharks using their age, educational level, marital status, and occupation as clues to what gang they belong to. The training data for the network (Table 1) consists of 16 individuals (8 Jets and 8 Sharks) - c.f. the Training Set and Output Set. The network applied to this training data is composed of 12 binary inputs (representing the different characteristics of gang members), 4 hidden units, and 2 output units (Jets or Sharks). As you can see from the information in the Table, Jets and Sharks gang members are either in their 20's, 30's or 40's, have a junior high (J.H.), high school (H.S.) or college education, are single, married, or divorced, and work as pushers, burglars, or bookies. The network is trained to correctly classify the inputs as Jets or Sharks based on this information. For members of the Jets gang, the Jets unit has a target activation of 1.0 and the Sharks unit has a target activation of 0.0. On the other hand, for members of the Sharks get, the Sharks unit has a target activation of 1.0, and the Jets unit has a target activation of 0.0. Note that there is also a Test Set containing examples that do not appear in the Training Set. These patterns can be used to test the networks generalization performance.



Table 1: Members of the Jets and Sharks gangs (adapted from McClelland and Rumelhart, 1986).

Name Gang Age Education Marital Status Occupation
Robin Jets 30's College Single Pusher
Bill Jets 40's College Single Pusher
Mike Jets 20's H.S. Single Pusher
Joan Jets 20's J.H. Single Pusher
Catherine Jets 20's College Married Pusher
John Jets 20's College Divorced Pusher
Joshua Jets 20's College Single Bookie
Bert Jets 20's College Single Burglar
Margaret Sharks 30's J.H Married Bookie
Janet Sharks 20's J.H. Married Bookie
Alfred Sharks 40's H.S. Married Bookie
Gerry Sharks 40's College Married Bookie
Brett Sharks 40's J.H. Single Bookie
Sandra Sharks 40's J.H Divorced Bookie
Beth Sharks 40's J.H. Married Pusher
Maria Sharks 40's J.H. Married Burglar

Beside the BPLearn button is a value labelled "Error" which monitors the total summed-squared error (TSS) of the network on the training set.

In order for a BackProp network to learn properly, you need to first randomize its weights by clicking on the "Randomize Weights and Biases" button.

To feed the activations forward from input to output click the Feedforward button. To apply a specific pattern to the inputs, click on the label for that pattern in the respective input set, or change the input activations directly.

Exercise 1: Randomize the weights and biases. Cycle the network on each of the input patterns in the training set and inspect the activations on the output units. For each individual, record whether that individual is classified as a Jet or a Shark by comparing the relative activation levels of the Jets and Sharks units.

As you can see by comparing the actual outputs with the correct outputs, the performance of the network is poor. This is because the network has not been trained yet.

Exercise 2: Does the untrained network have a classification bias?

When applying BackProp learning, it is convenient to be able to monitor the progress of learning over time. One way to do this is to create a graph of the total error.

Exercise 3: Use the graph tool to graph the output set error. Change the maximum value that can be displayed in the graph from 1 to 10. Now, randomize the weights and biases of the Jets and Sharks network and then train it for 40 epochs. What is the error on the output set after 40 epochs?

Exercise 4: After the network has been trained for 40 epochs, re-test it on the patterns in the training set. Evaluate the network's ability at this stage to classify the gang members as Jets or Sharks? Do any of the associations seem more difficult to learn than others? Why?

Exercise 5: Continue training the network until to the total error is less than 0.04. Test the network on the patterns in the training set to confirm that it can make the correct classifications. What is the shape of the error curve (or learning curve) over the course of training?

In this introductory section, we have provided a gentle overview of training and testing BackProp networks in Brainwave using a the Jets and Sharks problem as an example. In the next section, we consider the pattern classification problem in more detail. As we shall discover, two-layer networks consisting of an input and an output layer have specific limitations in the types of problems they can solve. However, by adding an "extra" hidden layer, neural networks can potentially map any set of inputs to any set of outputs. Unfortunately, a serious problem arises concerning how to learn that mapping. The BackProp algorithm was developed by Rumelhart, Hinton and Williams (1986) to address the learning problem for multilayer networks. After having properly motivated BackProp, we examine the feedforward dynamics of a BackProp network and rederive the BackProp learning algorithm. Next we consider some simulation issues in applying BackProp networks to problem. To conclude the tutorial, we return to the Jets and Sharks example to addess the question of how a BackProp network can be used to make predictions.

[Next Section: Background]