Suppose we have a large plastic sheet that we want to lay as flat as possible on the ground. To lay our sheet we employ a cyclic procedure. Firstly, we find a low point on it. Then we build up that section by placing sand underneath it. Then we find another low point and build it up; continuing until we are satisfied that the sheet is flat enough. How might we go about finding the low points? One possibility would be to place a drop of mercury on the sheet and watch as it rolls down the sheet. The point where it stops is low (although possibly not the lowest on the entire sheet) and would be a good candidate for raising. The idea of the state of a system (position of the mercury) descending on an energy surface (the plastic sheet) is a recuring one in the field of neural networks - both for describing changes to activations as a consequence of cycling and changes to weights as a consequence of learning. Figure 1 shows a schematic of our plastic sheet. The low points are the attractors. The region within which the drop will fall to a given attractor is its basin of attraction and the paths taken by the drop are called trajectories.
Figure 1: Attractors, basins of attraction and trajectories.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Original | Degraded | Reconstruction |
Figure 2 shows the results of a Hopfield network which was trained on the Chipmunk and Bugs Bunny images on the left hand side and then presented with either a noisy cue (top) or a partial cue (bottom). The images on the right hand side have been reconstructed by the Hopfield network. Notice that even in the face of substantial degradation the network is able to reconstruct a very close approximation to the original image. The quality of the network's reconstruction depends on the similarity of the stored images and the number of images that are stored given the size of the network.
Exercise 1: To see a Hopfield network in action open your BrainWave simulator and load the Attractors network from the Networks menu. Select the Weights Visible option from the Options menu to make the weights invisible. This network has been trained on the images of one, two, three and four in the Output Set. Select these patterns one at a time from the Output Set to see what they look like. Now select the Noisy Two pattern from the Test Set. You should see a representation of a two on the units with some spurious units incorrectly on or off (see figure 3). Now click on the cycle button. Note how the pattern gradually settles to the image of a two. The energy recorded by a value at the bottom of the screen and also graphed continues to decrease until the pattern stabilizes. What is the value of the energy after 100 cycles?
Figure 3: The "Noisy Two" pattern on a Hopfield Network.
Figure 4: The Hopfield network architecture.
sgn(x) = | 1 if x >= 0 |
-1 if x < 0 |
Figure 5: The sgn function.
You have already observed asynchronous updating in exercise one. On each cycle a new unit is selected (indicated by blue "handles") and the update rule is applied. Of course, much of the time the units activation before and after update is the same and so you see no change. On some occasions, however, the activation of the unit flips from red to blue or vice versa.
That is, the change in a weight is equal to the product of the activations of the units which it connects. So if both units are on (ai = 1) or both units are off (ai = -1) the strength of the weight is increased, otherwise it is decreased. Note that the mathematical expression goes beyond Hebb's Law in that it allows for a weight to increase if both units are off and to decrease if the activations of its units are 1 and -1. These properties are probably not physiologically justified and can be modified, however the system is easiest to understand if the simple Hebbian rule is applied.
The important property of Hebbian learning as far as the Hopfield network is concerned is that if we instantiate a pattern on the array of units and apply the above rule that pattern becomes an attractor of the network. Subsequently, if the pattern of activation of the network is close to a stored pattern it will tend to migrate to that pattern. Furthermore, once it arrives at the attractor it will remain there indefinitely - the attractors are stable.
Exercise 2: To confirm that the stored patterns are stable return to BrainWave. Turn the weights back on (using the Weights Visible item in the Options menu) and select the Zero Weights item from the Actions menu. The weights should all be black indicating that they have been set to zero. Now click on the Learn button once. Doing this cycles through each of the patterns in the Output Set instantiating each on the units and then applying the Hebbian learning rule (this happens behind the scenes - you wont see it on the screen). Be sure to only do this once. Doing it more than once will cause the weights to become larger. While this has no affect on the processing of the network it will affect the Energy values which you will be recording. If you accidently click Learn more than once, zero the weights and start again.
Now test each of the patterns in the Output Set by reseting the network, clicking on the pattern (One through to Four) and then clicking on Cycle. Notice that the patterns do not change and the Energy starts at a low value and remains there. Record the Energy of each pattern.
Pattern | Energy |
---|---|
One | |
Two | |
Three | |
Four |
In the next section we will discuss why the Hebbian learning rule leads to stable attractors at the stored patterns. The discussion contains some mathematics and can be omitted on a first reading.
because then the transfer function produces no changes. Now with only one pattern stored (that being the pattern currently on the units):
sgn(![]() |
= sgn(![]() |
= sgn(n aj) |
|
= aj for all j |
Furthermore, provided at least half of the units in the starting pattern have the correct value sgn(neti) will give the correct value and the network will converge to the stored pattern - so the pattern is an attractor of the network.
In this simple case there are actually two attractors. The second one is the inverse of the stored pattern - the reverse state. If more than half of the units in the starting pattern are different from the stored pattern then the network will end up in the reverse state.
Exercise 3: These reversed attractors are also present when more than one pattern has been stored. To confirm this reset the network and select the "Inverse Two" pattern from the Test Set. Now cycle the network. Note the reversed state is stable. What is the value of the energy? How does it compare against that of the original pattern?
The discussion above holds if there is only one pattern stored, but what is there are more patterns? According to the Hebbian update rule:
where u ranges over all of the stored patterns of which there are p and aui is the value of unit i in pattern u.
Now the stablity condition is:
Now if we separate out the case where u = v we get:
where ¬ u = v is read as u not equal to v.
If the absolute value of the second term (i.e. the crosstalk term) is less than n then the pattern will be stable and the pattern will act as an attractor for states close to it. This is often the case particularly if the number of patterns is small compared to the number of units. In fact, it can be shown that for randomly created patterns performance will be good provided the number of patterns is less than 0.138 by the number of units. The proof of this result is beyond the scope of this text, but interested students are referred to Hertz, Krogh and Palmer (1991) for the details.
The energy function of interest for Hopfield networks and which we have been using to this point is:
To see that the stored patterns will be low points in the energy landscape consider the contribution of a single pattern (u) to the energy when that pattern is instantiated:
![]() |
-1/2![]() ![]() |
= -1/2 ![]() |
|
= -1/2 n2 |
Any departure of the current pattern from the stored pattern could lead to a negative term in the sum which would increase the entire energy. Exactly what the energy will be depends on the extent to which the stored patterns overlap. In general, however, the stored patterns will be low in energy.
The main property of the energy function is that it always decreases or stays constant as the network cycles. The next section demonstrates mathematically why this function will always decrease. It can be omitted on a first reading.
where (ij) means all of the distinct pairs ij - so the pair 12 is the same as pair 21. The constant term C accumulates the (ii) pairs and is a constant which only depends of the number of units. Now consider the affect of an activation update on the energy. Suppose ai is the new value of ai. If ai = ai then the energy does not change. If ai = - ai we can express the change in energy as (looking at only those terms containing ai since it is the only thing changing):
H - H | = C - ![]() ![]() |
= 2 ai ![]() |
|
= 2 ai ![]() |
In fact, in many networks (including the one we have been working with) the self connections are set to zero (wii = 0). As you can see from the above argument the energy will decrease regardless. The self connections have the tendency to make any pattern more stable. They introduce unwanted spurious states and so they are omitted. In the Actions menu of BrainWave there is an option for selecting self weigths. If you wish to create your own Hopfield network you can use the Select Self Weights option and then Cut to remove them.
Exercise 4: Your task is to complete the following table of energy values. In each case, reset the network and select the appropriate pattern from the Test Set. Do one cycle and record the energy value. Then complete another 100 cycles (or until the pattern is complete) and record the energy value at that point.
Pattern | Starting Energy | Final Energy |
---|---|---|
Cut Off One | ||
Cut Off Four | ||
Noisy One | ||
Noisy Two | ||
Noisy Three | ||
Noisy Four |
Hebb, D. O. (1949). The Organization of Behavior. New York: Wiley. Partially reprinted in Anderson and Rosenfeld (1988).
Hertz, J., Krough, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Addison-Wesley, Redwood City, CA.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. In Proceedings of the National Academy of Sciences, pp. 2554-2558. National Academy of Sciences.
Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. In Proceedings of the National Academy of Sciences, pp. 81:3088-3092. National Academy of Sciences.