Suppose we have a large plastic sheet that we want to lay as flat as possible on the ground. To lay our sheet we employ a cyclic procedure. Firstly, we find a low point on it. Then we build up that section by placing sand underneath it. Then we find another low point and build it up; continuing until we are satisfied that the sheet is flat enough. How might we go about finding the low points? One possibility would be to place a drop of mercury on the sheet and watch as it rolls down the sheet. The point where it stops is low (although possibly not the lowest on the entire sheet) and would be a good candidate for raising. The idea of the state of a system (position of the mercury) descending on an energy surface (the plastic sheet) is a recuring one in the field of neural networks  both for describing changes to activations as a consequence of cycling and changes to weights as a consequence of learning. Figure 1 shows a schematic of our plastic sheet. The low points are the attractors. The region within which the drop will fall to a given attractor is its basin of attraction and the paths taken by the drop are called trajectories.
Figure 1: Attractors, basins of attraction and trajectories.
Original  Degraded  Reconstruction 
Figure 2 shows the results of a Hopfield network which was trained on the Chipmunk and Bugs Bunny images on the left hand side and then presented with either a noisy cue (top) or a partial cue (bottom). The images on the right hand side have been reconstructed by the Hopfield network. Notice that even in the face of substantial degradation the network is able to reconstruct a very close approximation to the original image. The quality of the network's reconstruction depends on the similarity of the stored images and the number of images that are stored given the size of the network.
Figure 3: The "Noisy Two" pattern on a Hopfield Network.
sgn(x) =  1 if x >= 0 
1 if x < 0 
Figure 4: The sgn function.
You have already observed asynchronous updating in exercise one. On each cycle a new unit is selected and the update rule is applied. Of course, much of the time the units activation before and after update is the same and so you see no change. On some occasions, however, the activation of the unit flips from red to blue or vice versa.
That is, the change in a weight is equal to the product of the activations of the units which it connects. So if both units are on (a_{i} = 1) or both units are off (a_{i} = 1) the strength of the weight is increased, otherwise it is decreased. Note that the mathematical expression goes beyond Hebb's Law in that it allows for a weight to increase if both units are off and to decrease if the activations of its units are 1 and 1. These properties are probably not physiologically justified and can be modified, however the system is easiest to understand if the simple Hebbian rule is applied.
The important property of Hebbian learning as far as the Hopfield network is concerned is that if we instantiate a pattern on the array of units and apply the above rule that pattern becomes an attractor of the network. Subsequently, if the pattern of activation of the network is close to a stored pattern it will tend to migrate to that pattern. Furthermore, once it arrives at the attractor it will remain there indefinitely  the attractors are stable.
Exercise 2: To confirm that the stored patterns are stable return to network above. Turn the weights back on (using the Weights Visible item in the Options menu) and zero the weights. The weights should all be black indicating that they have been set to zero. Now click on the Learn button once. Doing this cycles through each of the patterns in the Output Set instantiating each on the units and then applying the Hebbian learning rule (this happens behind the scenes  you wont see it on the screen). Be sure to only do this once. Doing it more than once will cause the weights to become larger. While this has no affect on the processing of the network it will affect the Energy values which you will be recording. If you accidently click Learn more than once, zero the weights and start again.
Examine the self weights. You will ntoice that they are all equal. What is their value and why?
Now test each of the patterns in the Output Set by reseting the graph, clicking on the pattern (One through to Four) and then clicking on Cycle. Notice that the patterns do not change and the Energy starts at a low value and remains there. Record the Energy of each pattern.
Pattern  Energy 

One  
Two  
Three  
Four 
In the next section we will discuss why the Hebbian learning rule leads to stable attractors at the stored patterns. The discussion contains some mathematics and can be omitted on a first reading.
because then the transfer function produces no changes. Now with only one pattern stored (that being the pattern currently on the units):
sgn( w_{ij}a_{i})  = sgn( a_{j} a_{i} a_{i}) 
= sgn(n a_{j}) 

= a_{j} for all j 
Furthermore, provided at least half of the units in the starting pattern have the correct value sgn(net_{i}) will give the correct value and the network will converge to the stored pattern  so the pattern is an attractor of the network.
In this simple case there are actually two attractors. The second one is the inverse of the stored pattern  the reverse state. If more than half of the units in the starting pattern are different from the stored pattern then the network will end up in the reverse state.
Exercise 3: These reversed attractors are also present when more than one pattern has been stored. To confirm this reset the network and select the "Inverse Two" pattern from the Test Set. Now cycle the network. Note the reversed state is stable. What is the value of the energy? How does it compare against that of the original pattern?
The discussion above holds if there is only one pattern stored, but what if there are more patterns? According to the Hebbian update rule:
where u ranges over all of the stored patterns of which there are p and a_{ui} is the value of unit i in pattern u.
Now the stablity condition is:
Now if we separate out the case where u = v we get:
where ¬ u = v is read as u not equal to v.
If the absolute value of the second term (i.e. the crosstalk term) is less than n then the pattern will be stable and the pattern will act as an attractor for states close to it. This is often the case particularly if the number of patterns is small compared to the number of units. In fact, it can be shown that for randomly created patterns performance will be good provided the number of patterns is less than 0.138 by the number of units. The proof of this result is beyond the scope of this text, but interested students are referred to Hertz, Krogh and Palmer (1991) for the details.
The energy function of interest for Hopfield networks and which we have been using to this point is:
To see that the stored patterns will be low points in the energy landscape consider the contribution of a single pattern (u) to the energy when that pattern is instantiated:
H =  1/2_{ij} w_{uij} a_{ui}a_{uj} 
= 1/2 _{ij} a_{ui}a_{uj} a_{ui}a_{uj}  
= 1/2 n^{2} 
Any departure of the current pattern from the stored pattern could lead to a negative term in the sum which would increase the entire energy. Exactly what the energy will be depends on the extent to which the stored patterns overlap. In general, however, the stored patterns will be low in energy.
The main property of the energy function is that it always decreases or stays constant as the network cycles. The next section demonstrates mathematically why this function will always decrease. It can be omitted on a first reading.
where (ij) means all of the distinct pairs ij  so the pair 12 is the same as pair 21. The constant term C accumulates the (ii) pairs and is a constant which only depends of the number of units. Now consider the affect of an activation update on the energy. Suppose a_{i} is the new value of a_{i}. If a_{i} = a_{i} then the energy does not change. If a_{i} =  a_{i} we can express the change in energy as (looking at only those terms containing a_{i} since it is the only thing changing):
H  H  = C  _{¬ i = j} w_{ij} a_{i} a_{j}  (C  _{¬ i = j} w_{ij} a_{i} a_{j}) 
= 2 a_{i} _{¬ i = j} w_{ij} a_{j}  
= 2 a_{i} _{j} w_{ij} a_{j}  2 w_{ii} 
Exercise 4: Your task is to complete the following table of energy values. In each case, reset the network and select the appropriate pattern from the Test Set. Do one cycle and record the energy value. Then complete another 100 cycles (or until the pattern is complete) and record the energy value at that point.
Pattern  Starting Energy  Final Energy 

Cut Off One  
Cut Off Four  
Noisy One  
Noisy Two  
Noisy Three  
Noisy Four 
Hebb, D. O. (1949). The Organization of Behavior. New York: Wiley. Partially reprinted in Anderson and Rosenfeld (1988).
Hertz, J., Krough, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. AddisonWesley, Redwood City, CA.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. In Proceedings of the National Academy of Sciences, pp. 25542558. National Academy of Sciences.
Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of twostate neurons. In Proceedings of the National Academy of Sciences, pp. 81:30883092. National Academy of Sciences.