Most of us have had the annoying experience of recognizing a face but not being able to remember to whom it belongs. We curse, make a reference to our age and reach for the nearest pop-psychology manual on how to improve our memories in ten easy steps. When you stop to think about it, however, the really astonishing thing is how often we don't forget. In a study by Shepard (1967) subjects were given a list of 580 arbitrary words to remember. On a forced choice test they scored at just under 88% correct. That's impressive. Furthermore, I could ask you what you were doing at 12:30pm one week ago and there is a good chance you would be correct, even if it isn't something you do all the time. Yet between then and now you have had thousands of experiences any one of which I could have queried. Somehow all of those experiences have left their mark, often without you even thinking about it. Long term memory is big.
The other really astonishing thing about memory is how flexibly we can access it. Suppose I ask you to list all the situation comedies you have ever watched. Most people from television dependent cultures wont have too many difficulties coming up with a reasonably large collection. Yet how could you answer this question given the obscure cue "situation comedy"? A database system might store a set of records such as {(Giligan's Island, situation comedy), (Full House, situation comedy), (World News, current affairs), etc.} and then cycle through these one at a time retrieving those that have the situation comedy tag. But this would require that when you are watching television shows you are constantly tagging them as situation comedy, current affairs etc. If you had tagged the program as "the funny show with the skinny sailor in it" the search for situation comedy would come back with NO RECORDS FOUND. Human memory is much more robust. We often use what seem to be very obscure cues yet we are able to retrieve well.
Much of the research into human memory has been an exploration of just how flexible our memories are - of the different sorts of questions that we can use our memories to answer. In this chapter, we will consider two ways in which the questions that we ask of our long term memories can differ. The first of these refers to the nature of the output that a question requires - are we asking for a specific name, word or other item from memory (retrieval) or do we require a yes/no answer about whether we remember some fact or episode (matching). The second distinction refers to the role of context in the query - are we asking about what happened in a given episode or context (episodic) or is the query about the way that things tend to be in general (semantic). After considering these tasks, we will look in some detail at the Matrix Model of long term memory (Pike, 1984; Humphreys, Bain & Pike, 1989) which provides a theory of how these questions could be answered using the formalism of matrix algebra.
Free Association provides a subject with a cue (e.g. "Type of animal") and requires them to respond with the first word that comes to mind (e.g. cat). Sometimes a prior study list is presented to the subject and it has been shown that words that occur in the prior study list are more likely to be produced even though the subject is not instructed to use the study list to make their decisions. Free association is a retrieval task because the required response is a word.
Cued Recall with a List Associate requires the subject to study a list of pairs of words. At test, subjects are given a list of words and asked which word occurred with each of the test words during the study episode (e.g. "Which word occurred with boy in the study list?"). Again, cued recall with a list associate is a retrieval task because it requires the subject to respond with a word.
In contrast some memory tasks known as matching tasks require what seems to be a more quantitative answer based on a continuous measure. The examples with which we will be concerned are familiarity rating and recognition.
Familiarity rating refers to a task in which subjects rate (on a five point scale, for instance) how familiar a word is to them in general (e.g. "How familiar is the word house to you?"). Familiarity rating is a matching task because it seems to be based on a continuous form of information.
Recognition requires a subject to study a list of words. At test, the subject is given a second list of words - some of which appeared in the first list and some of which did not. The subjects' task is distinguish the targets (words that were on the list) from distractors (words that weren't on the list). Either they are asked to make a yes/no decision or they provide their confidence that the word is old (which might be rated on a five point scale, for instance). Recognition is considered a matching task because it relies upon a continuous form of information.
Table 1: Examples of Episodic/Semantic Tasks and Matching/Retrieval Tasks (adapted from Humphreys, Bain & Pike, 1989)
Use of Context Cue | ||
Access Process | Episodic Memory | Semantic Memory |
Matching produces a rating value |
Recognition | Familiarity Rating |
Retrieval produces a word |
Cued Recall | Free Association |
Before looking at how the Matrix Model accounts for the differences between these tasks we need a grounding in tensors, the operations that can be performed on tensors and how tensors can be mapped to neural network architectures. If you are comfortable with these ideas you should continue on to the next section. If not we suggest you first complete the hebbian learning chapter.
Item vectors are distinguished by subscripts (e.g. ai). A distractor vector is indicated by a d.
x = n element column vector
a = n element row vector
E = x a1 + x a2 + ... + x ak
x = n element column vector
aj = n element row vector
bj = n element orthogonal vector
The resulting associations can be summed to form the memory for the list (E).
E = x aj bj
Note: the type of vector (i.e., row, column, orthogonal) can also be inferred from the order of the vector symbols, where: 1st vector = column vector; 2nd vector = row vector; and 3rd vector = orthogonal.
M = x aj bj + S
One of the strengths of the Matrix Model is in the number of a ways in which information from the model can be accessed. In the introductory section, two dimensions on which tasks can differ were outlined. The first was the matching/retrieval dimension. Matching tasks are those based on a continuous form of information that typically require either a yes/no answer or a rating response (e.g. recognition). Retrieval tasks, by contrast, require a specific item to be returned (e.g. cued recall). This distinction is captured in the Matrix Model by the nature of the tensor that results once all cues have been applied. If the resultant tensor is a scalar we are dealing with a matching process. This scalar can be compared against criteria to determine a yes/no or rating value. If the resultant tensor is a vector then we have a retrieval process. The vector can be compared against all item vectors with the item being the output of the process. The next sections, goes through the mathematics of recognition and cued recall with a list associate demonstrating how matching and retrieval tasks are accomplished within the model.
The second task dimension discussed in the introduction focussed on the the episodic/semantic dimension. Episodic tasks refer to a specific context, whereas in semantic (generalized) tasks information is integrated over a large number of experiences. The Matrix Model captures this distinction. In episodic tasks, a reinstated context vector is used as a cue. In semantic (generalized) tasks, a vector which is equally similar to all contexts is used so as to average over all experiences with the cue items (typically, this is a vector with all components set to 1/n where n is the dimension of the vector). The section entitled "Episodic versus Semantic Memory: Cuing with the Context Vector" describes an experiment designed to demonstrated the importance of the distinction and leads you through the process of modelling this experiment using the BrainWave simulator.
Studied Test Word (ai)
xai . M = xai . ( xaj + S)
= x ai . x aj + x ai . S
= (x . x) (ai . aj) + x ai . S
= (x . x) (ai . ai ) + (x . x) (ai . aj) + xai . S
Inserting the expected matching value:
x d . M = x d . ( xaj+ S)
= xd . xaj + xd . S
= (x . x) (d . aj) + xd . S
where
The match between the test cue and the experimental memories can further be collapsed down into :
Thus the final dot product derived from these equations, represents the match of the contexts on the study and test occasions (c), weighted by the match of the items on the study and test occasions (s and m). Consequently, memories that are conjointly defined by context and test cues will be weighted more heavily than items not studied in that context. This mechanism enables the model to avoid interference (large weights) from other items studied in the same context and also from previous contexts in which items have appeared.
For this reason, cued recall with a list associate is modelled using rank three tensors that associate word pairs (a1 b1, a2 b2,... ak bk) and context (x). The tensor is formed by taking the outer product of the context vector x and the two item vectors, aj and bj.
M = x aj bj + S
Subjects are then asked to recall list targets (bi) at test, using list associates (ai) and context (x) as cues. The retrieval cues (x and aj) are combined to form an associative matrix cue (x ai). Retrieval then involves the pre-multiplication of the rank three tensor (M) by the retrieval cue (x ai).
x ai . M = x ai . x aj bj + S
= [(x ai)(x aj)] bj + x ai . S
= [(x . x) (ai . aj)] bj + x ai . S
= (x . x) (ai . ai) bi + (x . x) (ai . aj) bj + x ai . S
Inserting the expected values:
E[x ai . M] = c s bi + c m bj + x ai . S
The end product (matrix product) of this process will comprise a target vector of feature weights. This featural information can be used to produce a word or item response.
The target vector is weighted by:
In the last two sections we have seen how, in a mathematical sense, the Matrix Model distinguishes between matching and retrieval tasks. In the next section, we will examine the episodic/semantic distinction by using the Matrix Model to simulate data generated by Bain & Humphreys (1989).
The subjects were grouped into three test conditions. Group A was asked to give a general familiarity rating for the words (a generalized matching condition). Group B was asked to recognise which words had been in the synonym generation task (an episodic matching condition). Group C was asked to recognise which words had been in the passage reading task (also an episodic matching condition). The mean recognition and familiarity ratings are displayed in Figure 11.
Figure 11: Mean ratings for three tasks as a function of presentation list(s) and word frequency. (a) Familiarity Rating Task (b) Recognition of Synonym Task words and (c) Recognition of Passage Task words. Note that in generalized familiarity task ratings depended only on the frequency of the word. For the episodic tasks, however, the lists in which the subjects were exposed to the word are critical.
As Figure 11 shows, subjects performing the episodic matching tasks were affected by the training context indicated in the task instructions, while subjects performing the general matching task were not influenced by the prior training conditions. Furthermore, the subjects did not have trouble reinstating the synonym context as opposed to the passage context, and vice versa.
These results suggest that subjects are able to distinguish episodic and semantic (or generalized) memory tasks quite well. One explanation is that the episodic and semantic memory systems are located in two different compartments in the brain. In the generalized familiarity task, subjects access the semantic store, in the episodic recognition task subjects access the episodic store. This may well be the case, however, Humphreys, Bain and Pike (1989) showed using the Matrix model that it need not be. The episodic/semantic distinction can be captured in a single coherent memory system by assuming differences in the types of cues supplied.
In the following exercises, the Matrix Model will be used to demonstrate how the difference between generalized familiarity and episodic recognition can be captured. To simplify the modelling process we assume a design similar to that employed by Bain and Humphreys (1989), but in which only one study list is presented. What we are looking for is a difference in the pattern of results for target and distractor words when asking for generalized familiarity versus episodic recognition. The key distinction, from the model's point of view, is in the nature of the context cue. In episodic recognition it will be assumed that the context cue is the same as that at study. In contrast, when modelling generalized familiarity the context cue will be a a vector in which all components are 0.1. This context vector will be similar to all of the pre-experimental contexts and the study context to approximately the same degree and will therefore produce an output which is approximately the mean of all exposures - not just the study list exposures.
Figure 1: The Matrix Model applied to Bain and Humphreys (1989).
Exercise 1: This network contains three sets of units - the input units, which will contain the context vectors, the output units, which will contain the items to which a context is associated and the match units, which contain the item to be tested. Weights are connected between the input units and the output units. What rank tensor does this network implement?
Above the units is a global value called "Dot Product". This global value indicates the dot product of the output units and the match units and is updated when you click on the DotProduct button. It is this value which will indicate the strength of a match in both the episodic recognition and generalized familiarity conditions.
In addition, there are three collections of pattern sets. The pre-experimental sets contain the input/output pairs representing the subjects experience before entering the experiment. Each context is different indicating that subjects pre-experimental experience with words arises from many different contexts. Each context vector has just three units active and these units are active to different degrees. The same is true for the output patterns which represent the words. However, some of the word patterns are repeated representing the difference between high and low frequency words. The high frequency words are repeated three times while the low frequency words appear just once. Note that real words occur much more often. We have decreased the numbers here to facilitate modelling. It is important to consider, however, what effect increasing the numbers of presentations would have. A later exercise will be directed towards this question. In the pre-experimental output set (as well as the match and experimental output sets), the words are followed by a tag such as hft or lfd. The hf or lf stands for high frequency and low frequency respectively, and the t or d stands for target or distractor. This tag just allows you to easily remember the type of each word without having to cycle through the relevant pattern sets.
Exercise 2: Click through the pre-experimental output set. How many presentations are there? How many unique words are there?
The experimental set represents a subject's experience during the study list. At study, words are all presented once and in the experimental output set each word appears just once. In all cases the study context is the same. Note that only target words appear in the experimental list.
Exercise 3: Click through the experimental output set. How many words are there?
The final collection of pattern sets are those that will be used for testing the network. The Test Contexts set contains the Study Context pattern and the Generalized Context pattern. When testing episodic recognition the Study Context pattern should be selected, when testing generalized familiarity the Generalized Context pattern should be selected. The Test Items set contains a copy of each of the words - both the targets and the distractors.
Exercise 4: Click through the Test Items set. How many words are there?
Now we are ready to train and test the system. Train the network for one epoch with the Pre-experimental input and output sets and then for one epoch with the Experimental input and output sets.
To test whether the network is familiar with a word in the study context, or is familiar with a word generally (it can be both): select select the word from the Test Items set; select either the Study Context or the General Context from the Test Contexts set and Feedforward once. Clicking on the Dot Product button will show the strength of the match.
Exercise 5: Simulate the generalized familiarity task and fill in the the dot product values in Table 1 below.
Table 1: Generalized Familiarity Task: Dot Product Values
High Frequency Target | Low Frequency Target | High Frequency Distractor | Low Frequency Distractor | ||||
child | avery | horse | crept | ||||
phone | elope | space | flank | ||||
woman | adage | eight | broth | ||||
light | dally | sound | envoy | ||||
visit | graft | april | aural | ||||
green | banjo | leave | debit | ||||
river | fidel | table | guise | ||||
MEANS |
Exercise 6: Simulate the episodic recognition task and fill in the the dot product values in Table 2 below.
Table 2: Episodic Recognition Task: Dot Product Values
High Frequency Target | Low Frequency Target | High Frequency Distractor | Low Frequency Distractor | ||||
child | avery | horse | crept | ||||
phone | elope | space | flank | ||||
woman | adage | eight | broth | ||||
light | dally | sound | envoy | ||||
visit | graft | april | aural | ||||
green | banjo | leave | debit | ||||
river | fidel | table | guise | ||||
MEANS |
Exercise 7: Produce graphs similar to those in figure 11 for the mean values of the dot products. That is, plot the mean dot product values for targets and distractors for both low and high frequency words in the generalized familiarity condition on one graph, and the mean dot product values for targets and distractors for both low and high frequency words in the episodic recognition condition on another graph. Are the generalized familiarity graphs flatter than the episodic recognition graphs? Why?
Exercise 8: In the generalized familiarity graph the model's results tend not to be as flat as the subject's data. Why might this be the case, and does it represent a refutation of the model? (Hint: consider the nature of pre-experimental experience).
Halford, G. S., Wiles, J., Humphreys, M. S., & Wilson, W. H. (1992). Parallel distributed processing approaches to creative reasoning: Tensor models of memory and analogy. unpublished manuscript.
Humphreys, M.S., Bain, J.D., & Burt, J.S. (1989). Episodically unique and generalized memories: Applications to human and animal amnesics. In S. Lewandowsky, J.C. Dunn & K. Kirsner (Eds.) Implicit Memory: Theoretical Issues. (pp. 139-158). Erlbaum Associates: Hillsdale, N.J.
Humphreys, M.S., Bain, J.D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208-233.
Pike, R. (1984). A comparison of convolution and matrix distributed memory systems. Psychological Review, 91, 281-294.
Wiles, J., & Humphreys, M.S. (1993). Using artificial neural networks to model implicit and explicit memory. In P.Graf & M. Masson (Eds.) Implicit Memory: New Directions in Cognition, Development, and Neuropsychology. (pp. 141-166). Erlbaum: Hillsdale, New Jersey.