The Hopfield Network: Descent on an Energy Surface

Introduction
The Hopfield Network
Descent on an Energy Surface
- Why does energy always decrease?*
- Partial and Noisy Cues
References
Answers to Questions

* These sections contain some mathematics which can be omitted on a first reading if desired.

Introduction

The Hopfield network (Hopfield 1982; Hopfield 1984) demonstrates how the mathematical simplification of a neuron can allow the analysis of the behaviour of large scale neural networks. By characterizing mathematically the effect of changes to the activation of individual units on a property of the entire neural architecture called energy, Hopfield (1982) provided the important link between local interactions and global behaviour. In this chapter, we will explore the idea of energy and demonstrate how Hopfield architectures "descend on an energy surface". We start by providing an overview of the purpose of the Hopfield network. Then we outline the architecture including the threshold activation function, asynchronous updating and hebbian learning. Finally, we explain how a Hopfield network is able to store patterns of activity so that they can be reconstructed from partial or noisy cues.

Suppose we have a large plastic sheet that we want to lay as flat as possible on the ground. To lay our sheet we employ a cyclic procedure. Firstly, we find a low point on it. Then we build up that section by placing sand underneath it. Then we find another low point and build it up; continuing until we are satisfied that the sheet is flat enough. How might we go about finding the low points? One possibility would be to place a drop of mercury on the sheet and watch as it rolls down the sheet. The point where it stops is low (although possibly not the lowest on the entire sheet) and would be a good candidate for raising. The idea of the state of a system (position of the mercury) descending on an energy surface (the plastic sheet) is a recuring one in the field of neural networks - both for describing changes to activations as a consequence of cycling and changes to weights as a consequence of learning. Figure 1 shows a schematic of our plastic sheet. The low points are the attractors. The region within which the drop will fall to a given attractor is its basin of attraction and the paths taken by the drop are called trajectories.

Figure 1: Attractors, basins of attraction and trajectories.

The Hopfield network is designed to store a number of patterns so that they can be retrieved from noisy or partial cues (see chapter 2 for a description of some of the important properties of neural networks). It does this by creating an energy surface which has attractors representing each of the patterns. The noisy and partial cues are states of the system which are close to the attractors. As a Hopfield network cycles it slides from the noisy pattern down the energy surface into the closest attractor - representing the closest stored pattern. One of the more visually appealing demonstrations involves storing a set of images. The network can then be presented with either a portion of one of the images (partial cue) or an image degraded with noise (noisy cue) and through multiple iterations it will attempt to reconstruct one of the stored images.



Original	Degraded	Reconstruction

Figure 2: Hopfield network reconstructing degraded images from noisy (top) or partial (bottom) cues.

Figure 2 shows the results of a Hopfield network which was trained on the Chipmunk and Bugs Bunny images on the left hand side and then presented with either a noisy cue (top) or a partial cue (bottom). The images on the right hand side have been reconstructed by the Hopfield network. Notice that even in the face of substantial degradation the network is able to reconstruct a very close approximation to the original image. The quality of the network's reconstruction depends on the similarity of the stored images and the number of images that are stored given the size of the network.

Exercise 1: To see a Hopfield network in action open your BrainWave simulator and load the Attractors network from the Networks menu. Select the Weights Visible option from the Options menu to make the weights invisible. This network has been trained on the images of one, two, three and four in the Output Set. Select these patterns one at a time from the Output Set to see what they look like. Now select the Noisy Two pattern from the Test Set. You should see a representation of a two on the units with some spurious units incorrectly on or off (see figure 3). Now click on the cycle button. Note how the pattern gradually settles to the image of a two. The energy recorded by a value at the bottom of the screen and also graphed continues to decrease until the pattern stabilizes. What is the value of the energy after 100 cycles?

Figure 3: The "Noisy Two" pattern on a Hopfield Network.

The Hopfield Network

Figure 4 shows the Hopfield connection structure. All units are connected to all other units and activations are either 1 (red) or -1 (blue).

Figure 4: The Hopfield network architecture.

The Threshold Activation Function and Asynchronous Updating

The Hopfield transfer function involves the same net input term which occurs in many neural architectures (see chapter 2):

net_i =

w_ija_i

where a_i is the activation of unit i, and w_ij is the weight from unit j to unit i. A threshold activation function is then aplied. Units with a net input above a certain threshold are set to 1, otherwise they are set to -1. In this chapter, we will assume that the threshold is always 0. So, if net_i is positive the activation is set to 1; if it is negative the activation is set to -1:

a_i = sgn(

w_ija_i)

where:

sgn(x) =	1 if x >= 0
	-1 if x < 0

The sgn function is depicted in figure 5.

Figure 5: The sgn function.

Having decided how to update the units we next have to determine when a unit will be updated. Since the Hopfield network is recurrent the order in which units are updated can have a significant affect on processing. There are two main methods. Either activations are set synchronously, updating all units simultaneously at each timestep (c.f. chapter two) or asynchronously, updating units one at a time in a random order. Both methods of update are used. In this chapter, we concentrate on asynchronous update so as to contrast it with the synchronous updating employed in other chapters (e.g. chapter two).

You have already observed asynchronous updating in exercise one. On each cycle a new unit is selected (indicated by blue "handles") and the update rule is applied. Of course, much of the time the units activation before and after update is the same and so you see no change. On some occasions, however, the activation of the unit flips from red to blue or vice versa.

Hebbian Learning: The Storage of Stable Patterns

The Hopfield network employs Hebbian learning. This form of learning is a mathematical abstraction of the principle of synaptic modulation first articulated by Hebb (1949). Hebb's law says that if one neuron stimulates another neuron when the receiving neuron is firing, the strength of the connection between the two cells is strengthened. Mathematically this is expressed as:

w_ij = a_i a_j

That is, the change in a weight is equal to the product of the activations of the units which it connects. So if both units are on (a_i = 1) or both units are off (a_i = -1) the strength of the weight is increased, otherwise it is decreased. Note that the mathematical expression goes beyond Hebb's Law in that it allows for a weight to increase if both units are off and to decrease if the activations of its units are 1 and -1. These properties are probably not physiologically justified and can be modified, however the system is easiest to understand if the simple Hebbian rule is applied.

The important property of Hebbian learning as far as the Hopfield network is concerned is that if we instantiate a pattern on the array of units and apply the above rule that pattern becomes an attractor of the network. Subsequently, if the pattern of activation of the network is close to a stored pattern it will tend to migrate to that pattern. Furthermore, once it arrives at the attractor it will remain there indefinitely - the attractors are stable.

Exercise 2: To confirm that the stored patterns are stable return to BrainWave. Turn the weights back on (using the Weights Visible item in the Options menu) and select the Zero Weights item from the Actions menu. The weights should all be black indicating that they have been set to zero. Now click on the Learn button once. Doing this cycles through each of the patterns in the Output Set instantiating each on the units and then applying the Hebbian learning rule (this happens behind the scenes - you wont see it on the screen). Be sure to only do this once. Doing it more than once will cause the weights to become larger. While this has no affect on the processing of the network it will affect the Energy values which you will be recording. If you accidently click Learn more than once, zero the weights and start again.

Now test each of the patterns in the Output Set by reseting the network, clicking on the pattern (One through to Four) and then clicking on Cycle. Notice that the patterns do not change and the Energy starts at a low value and remains there. Record the Energy of each pattern.

Pattern	Energy
One
Two
Three
Four

In the next section we will discuss why the Hebbian learning rule leads to stable attractors at the stored patterns. The discussion contains some mathematics and can be omitted on a first reading.

Why are stored patterns stable?

To understand why the Hebbian rule produces stable patterns we will start by considering a network which has stored just one pattern. For this pattern to be stable:

sgn(

w_ija_i) = a_j for all j

because then the transfer function produces no changes. Now with only one pattern stored (that being the pattern currently on the units):

w_ij = a_i a_j So if n is the number of units:

sgn( w_ija_i)	= sgn( a_j a_i a_i)
	= sgn(n a_j)
	= a_j for all j

Furthermore, provided at least half of the units in the starting pattern have the correct value sgn(net_i) will give the correct value and the network will converge to the stored pattern - so the pattern is an attractor of the network.

In this simple case there are actually two attractors. The second one is the inverse of the stored pattern - the reverse state. If more than half of the units in the starting pattern are different from the stored pattern then the network will end up in the reverse state.

Exercise 3: These reversed attractors are also present when more than one pattern has been stored. To confirm this reset the network and select the "Inverse Two" pattern from the Test Set. Now cycle the network. Note the reversed state is stable. What is the value of the energy? How does it compare against that of the original pattern?

The discussion above holds if there is only one pattern stored, but what is there are more patterns? According to the Hebbian update rule:

w_ij =

_u=1..p a_uia_uj

where u ranges over all of the stored patterns of which there are p and a_ui is the value of unit i in pattern u.

Now the stablity condition is:

sgn(net_vi) = a_vi for all i

where the net input net_vi to unit i in pattern v is:

net_vi =

_j w_ij a_vj =

_u a_ui a_uj a_vj

Now if we separate out the case where u = v we get:

net_vi = n a_vi +

_{¬ u = v} a_ui a_uj a_vj

where ¬ u = v is read as u not equal to v.

If the absolute value of the second term (i.e. the crosstalk term) is less than n then the pattern will be stable and the pattern will act as an attractor for states close to it. This is often the case particularly if the number of patterns is small compared to the number of units. In fact, it can be shown that for randomly created patterns performance will be good provided the number of patterns is less than 0.138 by the number of units. The proof of this result is beyond the scope of this text, but interested students are referred to Hertz, Krogh and Palmer (1991) for the details.

Descent on an Energy Surface

In the introduction we presented the analogy of an energy surface which underpins the analysis of many neural networks and during the course of the exercises so far we have referred to the Energy value. In this section we will define the energy value and demonstrate why the Hopfield activation equation causes it to decrease. Then we will look at the behaviour of the network when presented with partial and noisy cues.

The energy function of interest for Hopfield networks and which we have been using to this point is:

H = - 1/2

_ij w_ij a_ia_j

To see that the stored patterns will be low points in the energy landscape consider the contribution of a single pattern (u) to the energy when that pattern is instantiated:

H =	-1/2_ij w_uij a_uia_uj
	= -1/2 _ij a_uia_uj a_uia_uj
	= -1/2 n²

Any departure of the current pattern from the stored pattern could lead to a negative term in the sum which would increase the entire energy. Exactly what the energy will be depends on the extent to which the stored patterns overlap. In general, however, the stored patterns will be low in energy.

The main property of the energy function is that it always decreases or stays constant as the network cycles. The next section demonstrates mathematically why this function will always decrease. It can be omitted on a first reading.

Why does energy always decrease?

The Hebbian learning rule produces weight patterns that are symmetric. The easiest way to see this is to compare

w_ij which is a_i a_j and

w_ji which is a_j a_i. Since the changes are the same and the weights are initialized to zero the pattern remains symmetric. This allows us to write the energy equation as:

H = C -

_(ij) w_ij a_i a_j

where (ij) means all of the distinct pairs ij - so the pair 12 is the same as pair 21. The constant term C accumulates the (ii) pairs and is a constant which only depends of the number of units. Now consider the affect of an activation update on the energy. Suppose a_i is the new value of a_i. If a_i = a_i then the energy does not change. If a_i = - a_i we can express the change in energy as (looking at only those terms containing a_i since it is the only thing changing):

H - H	= C - _{¬ i = j} w_ij a_i a_j - (C - _{¬ i = j} w_ij a_i a_j)
	= 2 a_i _{¬ i = j} w_ij a_j
	= 2 a_i _j w_ij a_j - 2 w_ii

Now by the activation rule:

a_i = sgn(

_j w_ija_i)

So:

- a_i = sgn(

_j w_ija_i)

And 2 a_i

_j w_ij a_j must be negative. Also w_ii is equal to the number of patterns, so - 2 w_ii is also negative so the energy must decrease.

In fact, in many networks (including the one we have been working with) the self connections are set to zero (w_ii = 0). As you can see from the above argument the energy will decrease regardless. The self connections have the tendency to make any pattern more stable. They introduce unwanted spurious states and so they are omitted. In the Actions menu of BrainWave there is an option for selecting self weigths. If you wish to create your own Hopfield network you can use the Select Self Weights option and then Cut to remove them.

Partial and Noisy Cues

As outlined above the Hopfield network can be used to reconstruct partial and noisy patterns. In the introduction the Bugs Bunny and Chipmunk images were degraded by removing a section or applying uniform noise. These images were then reconstructed producing quite good approximations to the original patterns. In the final exercise you will use the attractors network to reconstruct the number patterns from similar partial and noisy cues.

Exercise 4: Your task is to complete the following table of energy values. In each case, reset the network and select the appropriate pattern from the Test Set. Do one cycle and record the energy value. Then complete another 100 cycles (or until the pattern is complete) and record the energy value at that point.

Pattern	Starting Energy	Final Energy
Cut Off One
Cut Off Four
Noisy One
Noisy Two
Noisy Three
Noisy Four

References

Anderson, J. A., & Rosenfeld, E. (1988). Neurocomputing: Foundations of Research. Cambridge: MIT Press.

Hebb, D. O. (1949). The Organization of Behavior. New York: Wiley. Partially reprinted in Anderson and Rosenfeld (1988).

Hertz, J., Krough, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Addison-Wesley, Redwood City, CA.

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. In Proceedings of the National Academy of Sciences, pp. 2554-2558. National Academy of Sciences.

Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. In Proceedings of the National Academy of Sciences, pp. 81:3088-3092. National Academy of Sciences.

The BrainWave homepage.