Long Term Memory: Matching versus Retrieval, Episodic versus Semantic

Simon Dennis and Rachael Gibson
Special thanks to Jill White

Table of Contents

Introduction

Long term memory has a profound effect on the way we live our everyday lives and is integral to many aspects of cognition. Whether we are recalling episodes of childhood abuse in the witness box, retrieving our favourite pumpkin scone recipe, or just trying to remember where we parked the car, memory plays a key role in allowing us to function, providing the texture of our experiences and defining our identities.

Most of us have had the annoying experience of recognizing a face but not being able to remember to whom it belongs. We curse, make a reference to our age and reach for the nearest pop-psychology manual on how to improve our memories in ten easy steps. When you stop to think about it, however, the really astonishing thing is how often we don't forget. In a study by Shepard (1967) subjects were given a list of 580 arbitrary words to remember. On a forced choice test they scored at just under 88% correct. That's impressive. Furthermore, I could ask you what you were doing at 12:30pm one week ago and there is a good chance you would be correct, even if it isn't something you do all the time. Yet between then and now you have had thousands of experiences any one of which I could have queried. Somehow all of those experiences have left their mark, often without you even thinking about it. Long term memory is big.

The other really astonishing thing about memory is how flexibly we can access it. Suppose I ask you to list all the situation comedies you have ever watched. Most people from television dependent cultures wont have too many difficulties coming up with a reasonably large collection. Yet how could you answer this question given the obscure cue "situation comedy"? A database system might store a set of records such as {(Giligan's Island, situation comedy), (Full House, situation comedy), (World News, current affairs), etc.} and then cycle through these one at a time retrieving those that have the situation comedy tag. But this would require that when you are watching television shows you are constantly tagging them as situation comedy, current affairs etc. If you had tagged the program as "the funny show with the skinny sailor in it" the search for situation comedy would come back with NO RECORDS FOUND. Human memory is much more robust. We often use what seem to be very obscure cues yet we are able to retrieve well.

Much of the research into human memory has been an exploration of just how flexible our memories are - of the different sorts of questions that we can use our memories to answer. In this chapter, we will consider two ways in which the questions that we ask of our long term memories can differ. The first of these refers to the nature of the output that a question requires - are we asking for a specific name, word or other item from memory (retrieval) or do we require a yes/no answer about whether we remember some fact or episode (matching). The second distinction refers to the role of context in the query - are we asking about what happened in a given episode or context (episodic) or is the query about the way that things tend to be in general (semantic). After considering these tasks, we will look in some detail at the Matrix Model of long term memory (Pike, 1984; Humphreys, Bain & Pike, 1989) which provides a theory of how these questions could be answered using the formalism of matrix algebra.

Matching Versus Retrieval Tasks: The Nature of the Output

Memory tasks differ with respect to the output that is required. The usual task that people have in mind when they think about human memory is one in which you are given one piece of information such as someone's face and you must retrieve another piece of information such as that person's name. These sorts of tasks seem to rely on a discrete or discontinuous form of information. The required output is an item. Two common examples of retrieval tasks are free association and cued recall with a list associate.

Free Association provides a subject with a cue (e.g. "Type of animal") and requires them to respond with the first word that comes to mind (e.g. cat). Sometimes a prior study list is presented to the subject and it has been shown that words that occur in the prior study list are more likely to be produced even though the subject is not instructed to use the study list to make their decisions. Free association is a retrieval task because the required response is a word.

Cued Recall with a List Associate requires the subject to study a list of pairs of words. At test, subjects are given a list of words and asked which word occurred with each of the test words during the study episode (e.g. "Which word occurred with boy in the study list?"). Again, cued recall with a list associate is a retrieval task because it requires the subject to respond with a word.

In contrast some memory tasks known as matching tasks require what seems to be a more quantitative answer based on a continuous measure. The examples with which we will be concerned are familiarity rating and recognition.

Familiarity rating refers to a task in which subjects rate (on a five point scale, for instance) how familiar a word is to them in general (e.g. "How familiar is the word house to you?"). Familiarity rating is a matching task because it seems to be based on a continuous form of information.

Recognition requires a subject to study a list of words. At test, the subject is given a second list of words - some of which appeared in the first list and some of which did not. The subjects' task is distinguish the targets (words that were on the list) from distractors (words that weren't on the list). Either they are asked to make a yes/no decision or they provide their confidence that the word is old (which might be rated on a five point scale, for instance). Recognition is considered a matching task because it relies upon a continuous form of information.

Episodic and Semantic Tasks: The Role of Context

Another important dimension on which memory tasks can vary is whether they make reference to a study episode. Tasks that do specify the study episode are known as episodic tasks, whereas tasks that do not are known as semantic tasks (or generalized tasks). Tulving (1972) realized that the learning in a study episode is not continuous with the learning that occurs before study. In particular, he realized that when we ask a subject in a recognition task do they recognize a word we are not asking them whether they know the word at all (often all of the test words are known to the subject). What we are asking is if the word occurred in a given list (the study list). Similarly, in cued recall with a list associate, we are not asking what word generally goes with boy. We are asking what word went with boy in the study list. In contrast, familiarity rating and free association make no specific reference to a study episode and are semantic tasks. Table 1 categorizes the four memory tasks described above in terms of the nature of the output and the role of context.

Table 1: Examples of Episodic/Semantic Tasks and Matching/Retrieval Tasks (adapted from Humphreys, Bain & Pike, 1989)

Use of Context Cue
Access Process Episodic Memory Semantic Memory
Matching
produces a rating value
Recognition Familiarity Rating
Retrieval
produces a word
Cued Recall Free Association

Before looking at how the Matrix Model accounts for the differences between these tasks we need a grounding in tensors, the operations that can be performed on tensors and how tensors can be mapped to neural network architectures. If you are comfortable with these ideas you should continue on to the next section. If not we suggest you first complete the hebbian learning chapter.


The Matrix Model

The Matrix Model of Memory was developed by Humphreys, Bain and Pike (1989) and Pike (1984) to provide a coherent theoretical account of a range of different memory tasks, including episodic tasks, such as recognition and recall, and semantic tasks, such as familiarity rating and indirect production tasks. It is a distributed associative model in which items are modelled and stored as vectors of feature weights or elements, just as was the case in the previous section. Elements within each vector contribute conjointly to the representation of items. Thus memory representations are not located at specific points within a memory network, or within specific memory systems. Instead they are conceptualised as unique patterns of activation over a common set of elements. Typically, these patterns are thought to be sparse representations meaning that only a few of the elements are active.

Memory Representations

The memory representations in the Matrix Model include items, contexts or, combinations of items and contexts (associations).
  1. Items - Items can be any sort of stimuli including words, pictures, melodies etc. For the most part, however, the experiments to which the model has been applied use words. Each item is modelled as a vector of feature weights. Feature weights are used to specify the degree to which certain features form part of an item. There are two possible levels of vector representation for items. These are:
    1. modality specific peripheral representations (e.g., graphemic or phonemic representations of words)
    2. modality independent central representations (e.g., semantic representations of words).

    Item vectors are distinguished by subscripts (e.g. ai). A distractor vector is indicated by a d.

  2. Contexts - To distinguish between episodic and non-episodic tasks the Matrix Model assumes the episode or context in which items are studied is also represented by a vector of feature weights. In episodic tasks this context vector must be reinstated so that it may be used as a cue to the memory system. The context vector is represented by an x.
  3. Associations - While individual items and contexts are represented as single vectors (a, b, x), associations between items and contexts are represented by matrices derived from the matrix product of these vectors. The resulting matrix product represents the association (or binding) between either items, or between items and context. The memory of the matrix Model is formed by adding these associations together. The model posits a number of different kinds of associations including:
    1. Two-way associations between a single item (a) and a context (x, e.g. bacon x breakfast this morning) are represented as a context-to-item association (x a), where

      x = n element column vector
      a = n element row vector

    2. Associations between a list of items (a1, a2,...,ak) and a context (x) are represented by multiplying each of the item vectors by the context vector and summing the resulting matrices. This sum represents the memory of the study list (E).

      E = x a1 + x a2 + ... + x ak

    3. Three-way associations between a list of word pairs (a1 b1, a2 b2, ... ak bk) and context (x, e.g., bacon x dog x breakfast this morning) are represented by the rank three tensor (x aj bj), where

      x = n element column vector
      aj = n element row vector
      bj = n element orthogonal vector

      The resulting associations can be summed to form the memory for the list (E).

      E = x aj bj

      Note: the type of vector (i.e., row, column, orthogonal) can also be inferred from the order of the vector symbols, where: 1st vector = column vector; 2nd vector = row vector; and 3rd vector = orthogonal.

    4. Pre-existing memories (S) are added to list memories (E) because test performance can be influenced by both list memories and pre-existing memories.

      M = x aj bj + S

    Accessing Memory Representations

    Having constructed the memory matrix, we can now see how the Matrix Model goes about accessing this representation at test for a number of different tasks. All retrieval in the Matrix Model is direct. The memory matrix is presented with cues and access occurs in parallel. There is no sequential search process. Presenting the model with a cue involves taking the inner product (or dot product) of the cue vector with the memory matrix.

    One of the strengths of the Matrix Model is in the number of a ways in which information from the model can be accessed. In the introductory section, two dimensions on which tasks can differ were outlined. The first was the matching/retrieval dimension. Matching tasks are those based on a continuous form of information that typically require either a yes/no answer or a rating response (e.g. recognition). Retrieval tasks, by contrast, require a specific item to be returned (e.g. cued recall). This distinction is captured in the Matrix Model by the nature of the tensor that results once all cues have been applied. If the resultant tensor is a scalar we are dealing with a matching process. This scalar can be compared against criteria to determine a yes/no or rating value. If the resultant tensor is a vector then we have a retrieval process. The vector can be compared against all item vectors with the item being the output of the process. The next sections, goes through the mathematics of recognition and cued recall with a list associate demonstrating how matching and retrieval tasks are accomplished within the model.

    The second task dimension discussed in the introduction focussed on the the episodic/semantic dimension. Episodic tasks refer to a specific context, whereas in semantic (generalized) tasks information is integrated over a large number of experiences. The Matrix Model captures this distinction. In episodic tasks, a reinstated context vector is used as a cue. In semantic (generalized) tasks, a vector which is equally similar to all contexts is used so as to average over all experiences with the cue items (typically, this is a vector with all components set to 1/n where n is the dimension of the vector). The section entitled "Episodic versus Semantic Memory: Cuing with the Context Vector" describes an experiment designed to demonstrated the importance of the distinction and leads you through the process of modelling this experiment using the BrainWave simulator.

    Matching Versus Retrieval Tasks: Scalar or Vector Output

    In this section, a matching task, namely recognition, and a retrieval task, namely cued recall with a list associate, are compared within the Matrix Model framework.

    Recognition

    Recognition involves a matching process, where the overall similarity between the test cues (x and ai) and memory (M) is calculated. Because this is an episodic task, the test cues involve both word cues and a context cue. This episodic matching process is accomplished by combining the test cues into an associative matrix (x ai) and determining a dot product between:
    1. the cue matrix (x ai), and
    2. the memory matrix (M = x aj + S).
    [Note: Because the dot product operation is associative, the results are identical regardless of whether you form a combined x ai matrix and then take a dot product or take the dot product of each of the cues with the memory matrix progressively.]

    Studied Test Word (ai)

    xai . M = xai . ( xaj + S)
    = x ai . x aj + x ai . S
    = (x . x) (ai . aj) + x ai . S
    = (x . x) (ai . ai ) + (x . x) (ai . aj) + xai . S

    Inserting the expected matching value:

    E[x ai . M] = c s + (k - 1) c m + g
    where
    c = similarity between the study and test context (assumed to large)
    s = similarity between the same word encoded at study and test (assumed to be large)
    m = similarity between different words at study and test (assumed to be small)
    g = contribution of pre-existing memories
    Non studied Test Word (d)

    x d . M = x d . ( xaj+ S)
    = xd . xaj + xd . S
    = (x . x) (d . aj) + xd . S

    where

    E[x d . M] = c m k + g
    Note that the matching operations in the above equations can be collapsed down into several components, including :
    1. a match between the test cue and the pre-experimental memories (i.e., x ai . S or x d . S), and
    2. a match between the test cue and the experimental memories
      (i.e., x ai . x aj or x d . x aj )

    The match between the test cue and the experimental memories can further be collapsed down into :

    1. a match between the context on study and test occasions (x . x = c), and
    2. a match between the study and test items (ai . ai = s and ai . aj = m) or (d . aj = m)

    Thus the final dot product derived from these equations, represents the match of the contexts on the study and test occasions (c), weighted by the match of the items on the study and test occasions (s and m). Consequently, memories that are conjointly defined by context and test cues will be weighted more heavily than items not studied in that context. This mechanism enables the model to avoid interference (large weights) from other items studied in the same context and also from previous contexts in which items have appeared.

    Cued Recall with a List Associate

    Cued recall with a list associate involves a subject studying a list of pairs. At test they are given an item and are required to produce the word with which it was paired at study. This is an important task because it can be used to demonstrate that three way association are necessary to model human memory. Simple associations two-way associations between items are insufficient (Humphreys, Bain & Pike 1989).

    For this reason, cued recall with a list associate is modelled using rank three tensors that associate word pairs (a1 b1, a2 b2,... ak bk) and context (x). The tensor is formed by taking the outer product of the context vector x and the two item vectors, aj and bj.

    M = x aj bj + S

    Subjects are then asked to recall list targets (bi) at test, using list associates (ai) and context (x) as cues. The retrieval cues (x and aj) are combined to form an associative matrix cue (x ai). Retrieval then involves the pre-multiplication of the rank three tensor (M) by the retrieval cue (x ai).

    x ai . M = x ai . x aj bj + S
    = [(x ai)(x aj)] bj + x ai . S
    = [(x . x) (ai . aj)] bj + x ai . S
    = (x . x) (ai . ai) bi + (x . x) (ai . aj) bj + x ai . S

    Inserting the expected values:

    E[x ai . M] = c s bi + c m bj + x ai . S

    The end product (matrix product) of this process will comprise a target vector of feature weights. This featural information can be used to produce a word or item response.

    The target vector is weighted by:

    1. the similarity of the context on the study and test occasions ( x . x = c), and
    2. the similarity of the list cue on the study and test occasions (ai . ai = s) and (ai . aj = m)
    Note that the weights for the same associate (s) will be greater than the weights for different associates (m) making the resulting vector look more like the correct associate (on average) than any other item. Noise will also be generated by the pre-existing memories. The assumption is that, in general, the similarity of the pre-existing contexts and the current context will be small leading to low levels of interference. Of course, if a recent context also included the cue word then much more interference will be generated because the context vectors will be more similar.

    In the last two sections we have seen how, in a mathematical sense, the Matrix Model distinguishes between matching and retrieval tasks. In the next section, we will examine the episodic/semantic distinction by using the Matrix Model to simulate data generated by Bain & Humphreys (1989).

    Episodic versus Semantic Memory: Cuing with the Context Vector

    Bain & Humphreys (1989, pg. 229) report an experiment which clearly demonstrates the difference between episodic and semantic matching tasks by reinstating the context during some, but not all, of the test conditions. Subjects were given a set of words and asked to produce a synonym for each. One week later the same subjects were given a passage containing unhighlighted target words, and asked to read the text and then answer questions on it. Half of the target words were common to both training stages. In addition to the test items already mentioned (synonym, passage, or both), words which appeared in neither training stage were also included as test items. Each set of test items contained equal numbers of high and low frequency words.

    The subjects were grouped into three test conditions. Group A was asked to give a general familiarity rating for the words (a generalized matching condition). Group B was asked to recognise which words had been in the synonym generation task (an episodic matching condition). Group C was asked to recognise which words had been in the passage reading task (also an episodic matching condition). The mean recognition and familiarity ratings are displayed in Figure 11.

    Figure 11: Mean ratings for three tasks as a function of presentation list(s) and word frequency. (a) Familiarity Rating Task (b) Recognition of Synonym Task words and (c) Recognition of Passage Task words. Note that in generalized familiarity task ratings depended only on the frequency of the word. For the episodic tasks, however, the lists in which the subjects were exposed to the word are critical.

    As Figure 11 shows, subjects performing the episodic matching tasks were affected by the training context indicated in the task instructions, while subjects performing the general matching task were not influenced by the prior training conditions. Furthermore, the subjects did not have trouble reinstating the synonym context as opposed to the passage context, and vice versa.

    These results suggest that subjects are able to distinguish episodic and semantic (or generalized) memory tasks quite well. One explanation is that the episodic and semantic memory systems are located in two different compartments in the brain. In the generalized familiarity task, subjects access the semantic store, in the episodic recognition task subjects access the episodic store. This may well be the case, however, Humphreys, Bain and Pike (1989) showed using the Matrix model that it need not be. The episodic/semantic distinction can be captured in a single coherent memory system by assuming differences in the types of cues supplied.

    In the following exercises, the Matrix Model will be used to demonstrate how the difference between generalized familiarity and episodic recognition can be captured. To simplify the modelling process we assume a design similar to that employed by Bain and Humphreys (1989), but in which only one study list is presented. What we are looking for is a difference in the pattern of results for target and distractor words when asking for generalized familiarity versus episodic recognition. The key distinction, from the model's point of view, is in the nature of the context cue. In episodic recognition it will be assumed that the context cue is the same as that at study. In contrast, when modelling generalized familiarity the context cue will be a a vector in which all components are 0.1. This context vector will be similar to all of the pre-experimental contexts and the study context to approximately the same degree and will therefore produce an output which is approximately the mean of all exposures - not just the study list exposures.

    Figure 1: The Matrix Model applied to Bain and Humphreys (1989).

    Exercise 1: This network contains three sets of units - the input units, which will contain the context vectors, the output units, which will contain the items to which a context is associated and the match units, which contain the item to be tested. Weights are connected between the input units and the output units. What rank tensor does this network implement?

    Above the units is a global value called "Dot Product". This global value indicates the dot product of the output units and the match units and is updated when you click on the DotProduct button. It is this value which will indicate the strength of a match in both the episodic recognition and generalized familiarity conditions.

    In addition, there are three collections of pattern sets. The pre-experimental sets contain the input/output pairs representing the subjects experience before entering the experiment. Each context is different indicating that subjects pre-experimental experience with words arises from many different contexts. Each context vector has just three units active and these units are active to different degrees. The same is true for the output patterns which represent the words. However, some of the word patterns are repeated representing the difference between high and low frequency words. The high frequency words are repeated three times while the low frequency words appear just once. Note that real words occur much more often. We have decreased the numbers here to facilitate modelling. It is important to consider, however, what effect increasing the numbers of presentations would have. A later exercise will be directed towards this question. In the pre-experimental output set (as well as the match and experimental output sets), the words are followed by a tag such as hft or lfd. The hf or lf stands for high frequency and low frequency respectively, and the t or d stands for target or distractor. This tag just allows you to easily remember the type of each word without having to cycle through the relevant pattern sets.

    Exercise 2: Click through the pre-experimental output set. How many presentations are there? How many unique words are there?

    The experimental set represents a subject's experience during the study list. At study, words are all presented once and in the experimental output set each word appears just once. In all cases the study context is the same. Note that only target words appear in the experimental list.

    Exercise 3: Click through the experimental output set. How many words are there?

    The final collection of pattern sets are those that will be used for testing the network. The Test Contexts set contains the Study Context pattern and the Generalized Context pattern. When testing episodic recognition the Study Context pattern should be selected, when testing generalized familiarity the Generalized Context pattern should be selected. The Test Items set contains a copy of each of the words - both the targets and the distractors.

    Exercise 4: Click through the Test Items set. How many words are there?

    Now we are ready to train and test the system. Train the network for one epoch with the Pre-experimental input and output sets and then for one epoch with the Experimental input and output sets.

    To test whether the network is familiar with a word in the study context, or is familiar with a word generally (it can be both): select select the word from the Test Items set; select either the Study Context or the General Context from the Test Contexts set and Feedforward once. Clicking on the Dot Product button will show the strength of the match.

    Exercise 5: Simulate the generalized familiarity task and fill in the the dot product values in Table 1 below.

    Table 1: Generalized Familiarity Task: Dot Product Values

    High Frequency Target Low Frequency Target High Frequency Distractor Low Frequency Distractor
    child avery horse crept
    phone elope space flank
    woman adage eight broth
    light dally sound envoy
    visit graft april aural
    green banjo leave debit
    river fidel table guise
    MEANS

    Exercise 6: Simulate the episodic recognition task and fill in the the dot product values in Table 2 below.

    Table 2: Episodic Recognition Task: Dot Product Values

    High Frequency Target Low Frequency Target High Frequency Distractor Low Frequency Distractor
    child avery horse crept
    phone elope space flank
    woman adage eight broth
    light dally sound envoy
    visit graft april aural
    green banjo leave debit
    river fidel table guise
    MEANS

    Exercise 7: Produce graphs similar to those in figure 11 for the mean values of the dot products. That is, plot the mean dot product values for targets and distractors for both low and high frequency words in the generalized familiarity condition on one graph, and the mean dot product values for targets and distractors for both low and high frequency words in the episodic recognition condition on another graph. Are the generalized familiarity graphs flatter than the episodic recognition graphs? Why?

    Exercise 8: In the generalized familiarity graph the model's results tend not to be as flat as the subject's data. Why might this be the case, and does it represent a refutation of the model? (Hint: consider the nature of pre-experimental experience).

    References

    Bain, J.D., & Humphreys, M.S. (1989). Instructional reinstatement of context: The forgotten prerequisite. In K. McConkey and A. Bennett (Eds.), Proceedings of the XXIV International Congress of Psychology, Vol. 3. Elsevier, North-Holland.

    Halford, G. S., Wiles, J., Humphreys, M. S., & Wilson, W. H. (1992). Parallel distributed processing approaches to creative reasoning: Tensor models of memory and analogy. unpublished manuscript.

    Humphreys, M.S., Bain, J.D., & Burt, J.S. (1989). Episodically unique and generalized memories: Applications to human and animal amnesics. In S. Lewandowsky, J.C. Dunn & K. Kirsner (Eds.) Implicit Memory: Theoretical Issues. (pp. 139-158). Erlbaum Associates: Hillsdale, N.J.

    Humphreys, M.S., Bain, J.D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208-233.

    Pike, R. (1984). A comparison of convolution and matrix distributed memory systems. Psychological Review, 91, 281-294.

    Wiles, J., & Humphreys, M.S. (1993). Using artificial neural networks to model implicit and explicit memory. In P.Graf & M. Masson (Eds.) Implicit Memory: New Directions in Cognition, Development, and Neuropsychology. (pp. 141-166). Erlbaum: Hillsdale, New Jersey.