Attentional interference in judgments of musical timbre: individual differences in working memory

Attentional interference in judgments of musical timbre: individual differences in working memory

Michael D. Hall

EVERYONE HAS EXPERIENCED being in a noisy environment with competing sources of incoming information. On a single day at any university, one can find students writing papers in a noisy computer laboratory while they monitor an instant chat and listen to their favorite CDs, whereas others will hide in a quiet corner of the library while they work on the same task. For many years, researchers have been interested in understanding why some people are more skilled in attending in a busy environment than are others, and one intriguing hypothesis is that working memory may be the key (Engle, 2002). In the present study, we examined that hypothesis by determining if working memory capacity influenced attentional interference in a dichotic listening task with tonal stimuli.

Nearly 30 years ago, Baddeley and Hitch (1974) first proposed their multicomponent theory of working memory, and a voluminous amount of research has confirmed its usefulness ever since. According to that theory, the working memory system consists of components to store visual and verbal information as well as a central executive to regulate the active contents of memory. Capacity limitations are dynamic and depend on the available pool of resources and the difficulty of the task, as well as the efficiency of the processes involved. Working memory is usually measured by using a complex span task in which participants must remember an item for later recall while they intermittently process new information. For example, in the Operations Span task (Turner & Engle, 1989) in the present study, a participant reads a mathematical operation out loud (e.g., (2 x 6)/3 = 4) and judges if it is correct. The operation is followed by a word (e.g., table) which is also read aloud. The participant continues to receive operations and words until asked to recall all of the words of the set. In other similar tasks, participants read or listen to a sentence and then are asked to later recall either an unrelated word or the last word of the sentence (Daneman & Carpenter, 1980).

Researchers have shown that working memory measured in that way is a good predictor of a wide range of tasks involving auditory and written language processing, problem solving, and reasoning (Blasko, 1999; Connine, Blasko, & Wang, 1994; Daneman & Carpenter, 1980; Engle, Tuholski, Laughlin, & Conway, 1999; Gilhooly, Logie, Wetherick, & Wynn, 1993; Just & Carpenter, 1992). In fact, it has been shown to be correlated with so many tasks that some researchers have hypothesized that it may, in fact, be Spearman’s g factor (Kyllonen & Christal, 1990). Although that claim is controversial, some research results have supported the idea that it accounts for much of the variance in classic tests of general fluid intelligence, such as the Raven’s Progressive Matrices. For example, Conway, Cowan, Bunting, Therriault, and Minkoff (2002) tested 120 young adults on measures of working memory, short-term memory, processing speed, and general fluid intelligence. By using structural equation modeling, they found that working memory was the best predictor of scores on measures of fluid intelligence.

The nature of that relationship is only beginning to be understood. One hypothesis is that working memory capacity is a resource for the central executive, which allows it to activate important, relevant information and inhibit irrelevant information. If that is the case, then working memory capacity may have less to do with memory than it does with the efficient use and allocation of attentional resources (Engle, 2002; Kane & Engle, 2003). Thus, measures of working memory capacity should be predictive not only of complex problem-solving abilities, but also of classic attention tasks in which there is competition for attentional resources.

The findings of some recent research support that idea. In a classic dichotic listening paradigm, participants are asked to listen to and shadow (repeat aloud) verbal information in one ear and to ignore information in the other ear. The listeners usually report hearing very little from the unattended ear, but occasionally report hearing their own names. In a recent replication study and extension, Conway, Cowan, and Bunting (2001) found that listeners with high working memory spans were much less likely to hear their name in the unattended ear (20%) than were listeners with low working memory spans, and thus, presumably, less working memory capacity (65%). Participants with high working memory spans also made fewer shadowing errors. That suggests that those with higher working memory spans were better at allocating attentional resources to the important information and ignoring the information that was irrelevant.

One of the most widely used measures of attentional allocation and interference is the classic Stroop paradigm (Stroop, 1935; for a review see MacLeod, 1991). In the most commonly used version of the task, participants are asked to name print color. In some conditions, a color may be displayed as a color block or a string of colored Xs (neutral condition); in other cases, the word red may be printed in red ink (congruent condition) or in another color ink (e.g., blue; the incongruent condition). The typical finding is that responding is slower and less accurate in the incongruent condition than it is in the congruent condition. In some cases, responding also is facilitated in the congruent condition relative to neutral control trials.

If working memory is indeed related to attentional control, then participants in the classic Stroop task with higher working memory capacity may be better able to focus on the intended dimension (ink color) and ignore the irrelevant dimension (the color word). Therefore, they should show less Stroop interference (usually indexed by slower and less accurate responses for the incongruent condition in comparison with the congruent condition) than do participants with relatively low working memory capacity. Long and Prat (2002) recently obtained evidence that is somewhat consistent with that possibility. They found that individuals with high working memory spans tend to be less susceptible to Stroop interference, but only in some conditions. Although Long and Prat found no differences in error rates, probably owing to ceiling effects, they did find evidence of interference (conflict vs. neutral trials) in response latencies. When the number of incongruent trials was very high (80%), the participants with low and high working memory spans showed similar levels of interference. However, when incongruent trials were very rare (20%) and there was a high frequency of neutral trials (XXXXX printed in red, yellow, blue, or green), the participants with low working memory spans showed much more interference in response latencies.

Not all researchers have found links between Stroop interference and general fluid intelligence. In the intelligence and neuropsychological literature, the findings are quite mixed (Kane & Engle, 2003). Kane and Engle investigated what factors might influence individual differences in Stroop interference. They suggested that although individuals with low working memory spans show rather consistent Stroop interference, those with high working memory spans may show a smaller Stroop effect in response accuracy, latency, or both, depending on several factors. The factors include (a) the percentage of conflict trials that are present, (b) the presence or absence of neutral trials (also see Long & Prat, 2002) or congruent trials with the conflict trials, (c) the strength of the instructional set induced by the preceding block, and (d) the presence or absence of feedback.

Kane and Engle (2003) hypothesized that Stroop interference is determined by at least two separate mechanisms–goal maintenance and response competition–on which working memory may have an impact. According to that view, one mechanism involves a process that maintains the task goal of not responding to information from the irrelevant stimulus dimension (e.g., a color word). That process should fail more often when task conditions maximize the need for goal maintenance, such as when initial stimulus conditions include a moderate proportion of congruent trials (i.e., matched color word and color ink). Such a transient failure of goal maintenance should be reflected in increased error rates for incongruent (i.e., mismatched color word and color ink) conditions relative to congruent conditions. Another proposed mechanism involves the time-consuming, attention-based resolution of competing responses (i.e., resulting from the processing of word and ink dimensions) after successful, active maintenance of the task goal. The hypothesis is that individuals with low working memory spans have enduring deficits in the response-competition mechanism, which are reflected by increases in response times to incongruent conditions.

One question that has been hotly debated in the working memory and attention literature is whether working memory and executive control processes are domain specific or domain general. For example, in the capacity theory of language, Just and Carpenter (1992) proposed that the reading span task they had developed measured only working memory for language. In later work, Shah and Miyake (1996) developed a spatial span task they purported showed that spatial working memory was a separable dimension from working memory for language. Although the original working memory model proposed by Baddeley and Hitch (1974) suggested a central executive system, it also contained separate systems for active storage of auditory (phonological loop) and spatial (visuospatial sketchpad) information. Therefore, it is important to begin to investigate the influence of working memory on other tasks that require attentional control but that are less dependent on language processing.

It would be preferable to provide converging evidence from a task that could show attentional interference within a single domain of processing. Such a task would permit a relatively straightforward evaluation of the general nature of attention. In contrast, tasks involving different stimulus dimensions (such as color and word) typically differ in the degree to which they involve different types of processing, and thus, different attentional resources. Different stimulus dimensions also will typically affect task difficulty to a different extent and thus make disparate demands on the capacity of attentional resources. As a result, in studies of the traditional Stroop task, researchers have typically had to address issues of the relative automaticity of color and word processing. Ideally, an alternative task should also be difficult enough to avoid ceiling effects to permit an adequate evaluation of the hypothesis of Kane and Engle (2003) that errors and response latencies may reveal different subprocesses of attention.

Hall, Koch, and Griffith (2004) recently developed such a task by using musical stimuli. The participants were asked to listen to high or low clarinet and violin tones presented to the left ear, right ear, or both ears. They were asked to monitor one ear and to judge whether that tone was from a violin or clarinet. The key comparison was between the same-instrument condition and the different-instrument condition. In the different-instrument condition, a tone from one instrument was presented in the attended ear, while a tone from the other instrument was simultaneously presented in the unattended ear (e.g., violin right ear/clarinet left ear). In contrast, in the same-instrument condition, the same instrument (e.g., violin) was presented simultaneously in both ears (although they differed in pitch to maintain the distinction between tones). In the final condition, only one instrument was presented in the monitored ear (single-instrument condition).

Hall et al. (2004) showed in a series of experiments that the participants were fastest in the single-instrument condition, followed by the same-instrument condition, which reflected a cost associated with at least some processing of an additional distractor stimulus. The participants responded even more slowly in the different-instrument condition, which indicated additional processing costs associated with the presence of incongruent timbre information. The latter effect was not owing simply to the spectral features of the distractor tone because the same levels of interference were not obtained when noise with a similar spectral envelope was substituted for the different timbre in the unattended ear.

That task may have several potential benefits for the study of individual differences. First, it uses musical stimuli that vary in pitch, timbre, and spatial location across ears. Therefore, it requires very little verbal or linguistic information except at the point of response when the participant decides if the tone is a violin or clarinet by pressing V or C on a keyboard. Second, the task can be made relatively easy or much more difficult by varying the similarity between the timbres of the different instruments. As a result, the task avoids the issues of floor and ceiling effects often seen in the classic Stroop task. Third, whereas most adults are not as expert at judging musical timbre as they are at reading words, well-practiced musicians may not reflect a great disparity in the levels of their expertise on those tasks. Unlike traditional Stroop tasks, the timbre judgment task affords an opportunity to study practice and training effects within a general population of adults.

In the present work, we attempted to replicate the findings of Hall et al. (2004) and investigate whether their timbre judgment paradigm is sensitive to potential individual differences in working memory capacity. If the influence of working memory is primarily domain general, then the size of any attentional interference effect in terms of response times and/or performance accuracy should be moderated by working memory capacity. Specifically, participants with low working memory spans should show a larger difference than do participants with high working memory spans between the same- and different-instrument conditions, as well as between the single- and same-instrument conditions. Furthermore, if the dual-processing account that has been hypothesized for the Stroop effect (Kane & Engle, 2003) applies in a similar manner to the timbre task, then one would anticipate more consistent interference effects in the different-instrument condition, particularly for listeners with low working memory spans. That would indicate difficulty with the resolution of competing responses. In contrast, corresponding effects for accuracy, which are argued to reflect a failure to maintain attention to the task goals (i.e., the target tone), should depend on the probabilities of congruency. As a result, such effects may not be consistently obtained, particularly for listeners with high working memory spans.



The participants were 57 undergraduate students who received partial course credit for taking part in the experiment. The mean age was 19.87 years, and 60% were male. All participants reported normal hearing.

Materials and Equipment

Working memory. Working memory was assessed using the Operations Span task developed by Turner and Engle (1989). In all complex span tasks, the idea is to create a situation in which participants must maintain information while they process new information. In the Operations Span task, the participants read aloud a series of short math problems followed by a word (e.g., 4 x 2/3 = 3 horse). They are asked to respond true or false to the math problem and then to try to remember the words until they are asked to recall them. The number of trials increases from two per set to six per set. There are three blocks at each set size. In our version of the task, the block was scored correct only if all the words were recalled correctly. The span score was calculated as the set size at which the individual was correct on at least two of three blocks. An additional half a point was added for one out of three correct. For example, if an individual received a score of three out of three correct at set-size two, two of three correct at set-size three, and one of three correct at set-size four, then the final score would be 3.5.

Musical experience questionnaire. All the listeners completed a musical experience questionnaire that consisted of items in which they were asked about their background and experience with instrumental and vocal musical performance. The questionnaire included items that asked whether the participant had taken formal lessons in instrumental or vocal music and received instruction in formal music theory. The participants also were asked to indicate what musical instruments (if any) were played, how long it had been since they had last played each instrument, and the duration of their training on each instrument. Because our attention task involved assessments of timbre and pitch, which have been shown to be influenced by amount of musical training (e.g., see Iverson & Krumhansl, 1992; Pitt, 1994), the number of years that the participant actively played an instrument was selected as the critical variable for evaluating potential effects of musical experience.

Stimuli. All stimuli were generated from four musical tones that were digitized from the McGill University Master Samples library (Opolko & Wapnick, 1987). The tones reflected the combination of two pitch chroma ([C.sub.4] and [F.sup.#.sub.4]) with two instrument timbres ([E.sup.b] clarinet and violin [with vibrato]). The selection of timbre was based on the results of a previous experiment (Hall, 2001) that involved the multidimensional scaling of relative perceptual distances between five timbres. The results indicated that violin and clarinet were consistently judged to be the least similar, and therefore, the most perceptually distinct pair of timbres. In the present experiment, we hoped that the reliance on perceptually distinct timbres would aid participants in the quick and accurate identification of those timbres. The fundamental frequencies of the chroma were 262 Hz and 370 Hz, for [C.sub.4] and [F.sup.#.sub.4], respectively. They were inharmonically related and were separated by more than a critical band (e.g., see Fletcher, 1972). That was done to reduce fusion tendencies (e.g., see Pastore, Schmuckler, Rosenblum, & Szczesiul, 1983), and thereby aid in the separation of auditory events.

Each tone was digitized at a 44.1 kHz sample rate and was truncated to 250 ms from onset. The amplitude of each tone was then ramped off linearly over the final 20 ms and normalized to the maximum allowable voltage without distortion. All tones were equated for root-mean square (RMS) amplitude.

The four isolated tones were sent to either the left or right channel. That created eight stereo stimulus files (four to each channel) consisting of a single-instrument tone (single-instrument condition). Each of the eight stimuli also was mixed with contralateral stimuli from the set to create eight dual-instrument stimuli (dual-instrument condition). The two chroma were never identical in the dual-instrument conditions; one was presented to the left ear (e.g., [C.sub.4]), and the other was presented to the right ear ([F.sup.#.sub.4]). Thus, each of the single-instrument tones for a given ear/channel (e.g., violin [C.sub.4] to the left ear) was separately mixed with the single-instrument tone for the other ear/channel and chroma that reflected either the same-instrument timbre (violin [F.sup.#.sub.4] to the right ear; in this example, same-instrument condition) or the different-instrument timbre (e.g., clarinet [F.sup.#.sub.4], different-instrument condition). The mixing process produced four same-instrument stimuli and four different-instrument stimuli. All stimuli were delivered to listeners at 50 dB[A] through Panasonic stereo RP-HT 355 headphones in a booth in a quiet room. The selection of that low amplitude came from the preceding work of Hall et al. (2004) and minimized potential effects of contralateral masking (e.g., see Newby, 1979).

Because musical instrument timbre and pitch have been demonstrated to interact perceptually in speeded classification tasks (Melara & Marks, 1990; Pitt, 1994), pitch represented an additional source of task-irrelevant variation that could have an adverse impact on both performance accuracy and response times. However, neither tendency complicated the interpretation of potential perceptual interference effects. Congruent (i.e., same-instrument) and incongruent (i.e., different-instrument) conditions used identical dichotic pairs of pitches that occurred with equal probability. Thus, pitch should not differentially have an impact on responses for same- and different-instrument conditions.

The added variance in pitch was expected to eliminate facilitation effects, which were absent from the previous work with the task by Hall et al. (2004). Whereas neutral control (i.e., the single-instrument) conditions consisted of only one tone, the congruent (same-instrument) conditions necessarily included an additional pitch that differed from the target. We expected that to increase the cognitive load, which should have slowed response times and possibly reduced performance accuracy relative to neutral trials. Comparisons to the single-instrument condition were still included to address the remote possibility of facilitation.


Each participant was run in a single session that took approximately 1 hr. After the participants completed the informed consent documents, they first completed the musical experiences questionnaire and then the Operations Span task of working memory. The timbre judgment task constituted the final part of the session. The listeners’ task was to indicate as quickly and accurately as possible the instrument timbre in an assigned target ear while they ignored (tone) information that might have been present in the other ear. Before beginning that task, the listeners were familiarized with each of the four isolated tones (representing the possible conjunctions of 2 timbres and 2 chroma) that would be in the stimulus conditions of the experiment. During the familiarization period, the original (binaural) versions of the isolated tones were presented in an identified, non-random order (i.e., violin [F.sup.#.sub.4], clarinet [F.sup.#.sub.4], violin [C.sub.4], clarinet [C.sub.4]) at 1,000 ms intervals. The familiarization was repeated three times. No responses were collected during the familiarization period. Instead, the listeners were instructed to listen carefully to each tone so that they could accurately recognize the timbres in the experiment. Although the listeners were told that they could repeat the familiarization sequence as many times as they felt necessary, no one repeated the trials.

Each listener responded to each target ear over different blocks of trials, and the order of target ear assignment was counterbalanced across listeners. For a given target ear, each participant first completed a block of 24 randomized practice trials to ensure that the task was understood. Practice blocks of trials were composed of two repetitions of each of the 12 possible stimuli (the 4 single-instrument stimuli for the assigned target ear, the 4 same-instrument stimuli, and the 4 different-instrument stimuli). The listeners were asked to indicate as soon as possible after the onset of each stimulus which instrument timbre was presented to the target ear.

Practice trials for a given target ear were immediately followed by 240 corresponding experimental trials consisting of 20 randomized repetitions of the 12 stimuli. A brief rest break was provided upon the completion of experimental trials for the initially assigned target ear. The 480 total experimental trials (240 per target ear) consisted of 160 trials each from the single-instrument, same-instrument, and different-instrument conditions.

The presentation of stimuli, as well as the collection of response and timing data, was controlled by a PC running Kendall’s (2001) Music Experiment Development System (MEDS-2001-A). A 500 ms intertrial interval followed each response before the beginning of the next trial. The listeners made responses on each trial using the dominant hand to press the button on a computer keyboard that corresponded to the first letter of the intended instrument timbre (i.e., C for clarinet, or V for violin). The response times were measured to within 1 ms accuracy and were measured from stimulus onset.


Analysis Strategy

We established several a priori response criteria to ensure appropriate measures of mean response times as a function of memory span. First, data from those participants whose overall accuracies were below 70% (n = 3) were not used in the analysis. They were excluded to ensure a sufficient number of responses for the analyses of response times, which were limited to responses on correct trials. Second, any response time from an individual trial that was greater than 3 standard deviations above the participant’s grand mean or was less than 150 ms, was removed as an outlier. Such trials constituted less than 2% of the total data. Third, to assess whether working memory span influenced the size of the interference effect, we calculated responses and response times separately for participants with low and high working memory spans. Participants with span sizes of 2.0-2.5 constituted the low working memory span group (the LWM group; n = 18), and those with a span size of 4.0 or above constituted the high working memory span group (the HWM group; n = 16).

In the first set of analyses, percentage correct and response times to correct trials were analyzed across the full set of conditions. The median of the 20 repetitions of each stimulus was used to create the condition means. The initial analysis contained 4 within-subjects variables: 2 (ear monitored: right, left) x 2 (target instrument: clarinet, violin) x 2 (pitch chroma: high, low) x 3 (instrument combination: single-instrument, same-instrument [clarinet/clarinet or violin/violin], different-instrument [clarinet/violin]) and one between-subjects variable (working memory span: high, low).

The results showed that the working memory span groups did not differ overall: percentage correct, F(1, 32) = 2.72, p = .11; response time, F(1, 32) = .126, p =.73. However, there were significant interactions between working memory span and instrument combination in both percentage correct, F(2, 64) = 4.90, p = .01, and response time analyses, F(2, 64) = 4.79, p = .01.

Because the primary variable of interest, working memory span, did not interact with the variables of ear monitored, target instrument, and pitch chroma, we simplified all further analyses by calculating means collapsed across those variables and conducted a series of 3 (instrument combination: single-instrument, same-instrument, different-instrument) x 2 (working memory span: high, low) analyses of variance (ANOVAs) on the response accuracies and response latencies. Those analyses were followed by evaluations of simple main effects and planned comparisons to examine whether each working memory span group showed significant interference effects.

Response Accuracy

Figure 1 shows the means and standard errors from the 2 (working memory group) x 3 (instrument combination) ANOVA on percentage correct identification. There was no main effect of working memory span; overall, the LWM and HWM groups did not differ, F(1, 32) = .126, p = .73. However, there was a robust main effect of instrument combination, F(2, 32) = 35.13, p < .0001. That was reflected in analyses of simple main effects despite the fact that the same-instrument condition might have represented a somewhat more complex cognitive task than did the single-instrument condition. There was no statistically significant difference between the two conditions, t(33) = 1.57, p = .13. However, there was a robust interference effect no matter which control condition was used. As can be seen in Figure 1, the different-instrument condition was significantly less accurate (74%) than both the single-instrument condition (86%), t(33) = 7.36, p < .001, and the same-instrument condition (85%), t(33) = 6.18, p < .001.


When the results were examined separately for the HWM and LWM groups, the pattern for each range of working memory spans mirrored the overall findings. As indicated in Figure 1, similar mean accuracies were obtained in the single-instrument and same-instrument conditions, t(15) = 1.27, p = .22 (HWM), and t(17) = 1.00, p = .33 (LWM), However, the different-instrument condition was significantly less accurate than both the single-instrument condition, t(15) = 4.09, p = .001 (HWM), and t(17) = 6.89, p < .001 (LWM), and the same-instrument condition, t(15) = .31, p = .005 (HWM), and t(17) = 5.77, p < .001 (LWM).

A major issue was whether the participants with high working memory spans would have a greater ability to inhibit task-irrelevant timbre information, as indicated by a reduced interference effect. The significant interaction between instrument combination and memory span was consistent with that possibility, F(2, 64) = 4.90, p = .01. An inspection of the means in Figure 1 shows that the size of the interference effect (calculated as Same–Different) was approximately 6.4% for the HWM group and twice that size, 13.9%, for the LWM group. To confirm that statistically, we calculated an interference effect for each participant (different- vs. same-instrument). An independent samples t test showed that the LWM group had significantly larger interference effects than did the HWM group, t(32) = 2.41, p = .02.

Response Latencies

Figure 2 shows means and standard errors for the response times from correct trials. The main effect of working memory span did not reach statistical significance, F(1, 32) = 2.73, p = .11, although the means of the HWM group were somewhat faster than those of the LWM group. Once again, there was a significant main effect of instrument condition, F(2, 64) = 24.47, p < .0001. Although mean performance in the same- and single-instrument conditions did not differ in accuracy, mean response times were significantly slower in the same-instrument condition than were those in the single-instrument condition (1,025 vs. 918 ms), t(33) = 4.72, p < .001. Mean response times for the same-instrument condition also were faster than those for the different-instrument condition (1,085 ms), t(33) = 3.64, p = .001.


Those effects were moderated by a significant interaction between instrument combination and working memory span, F(2, 64) = 4.79, p = .01. In the accuracy analysis, both the HWM and the LWM groups showed an interference effect, but it was larger for the low span participants. An examination of the response times for the two working memory groups (see Figure 2) may provide additional support for the notion that greater working memory capacity confers an advantage in complex attention tasks. LWM participants were much slower to respond in the different-instrument condition than in the same-instrument condition, t(17) = 3.86, p = .001, but they were also slower to respond in the same-instrument condition than in the single-instrument condition, t(17) = 3.74, p = .002. In contrast, the HWM group seemed to pay only a single price for the additional complexity of the stimuli. That is, although they responded faster in the single-instrument condition than in the same-instrument condition, t(15) = 3.19, p = .008, response times from the same-instrument condition did not differ from the different-instrument condition, t(15) = 1.69, p = .26. An independent samples t test comparing the size of the interference effect for LWM and HWM groups confirmed that the LWM group had significantly more interference (Same–Different) than did the HWM group, t(32) = 2.72, p = .009.

In sum, the LWM group was both slower and less accurate when dichotic instrument timbres were incongruent (different-instrument condition) relative to when they were congruent (same-instrument condition). Although participants in the HWM group were similarly less accurate on incongruent trials, their correct responses in that (different-instrument) condition were just as fast as in the congruent (same-instrument) condition.

Musical Experience

One benefit of using musical stimuli in the present investigation is that the participants are less practiced with the stimuli and, therefore, much less likely to reach a ceiling effect. In contrast, in the traditional Stroop task, all normal adults have considerable experience identifying colors and reading simple words. However, when using musical stimuli, musical experience can confound the results if it is not evenly distributed across conditions. Information about the participants’ musical experience therefore was collected before the experimental session. That provided an opportunity to examine whether there were differences based on the level of musical experience and whether those effects were related to working memory span. It is possible that participants who can naturally discriminate and remember musical dimensions such as timbre and pitch may also be more likely to become musicians. If that is so, then on the one hand, musical experience could be directly related to working memory capacity and could jointly predict the ability to control attentional focus. On the other hand, musical experience may help an individual to have a clearer conception of musical timbre and even to separate pitch information, but it may not influence attentional control and goal maintenance. As a result, there may be an influence of musical experience on overall speed and accuracy but not on the size of attentional interference.

To evaluate the roles of both memory span and musical experience in attentional interference, it would be ideal to have musicians and nonmusicians with high and low working memory spans. That would require a much larger sample size than we were able to obtain in the present study. However, our investigation did afford a correlational analysis because there was an ample range along both span and experience variables. Working memory spans were broadly distributed, ranging from 0.5-6.0; the mean score was 3.1. Although nearly 30% of the listeners had no musical training, and many others had very little, experience playing a musical instrument also was broadly distributed (ranging from 0-18 years). The mean number of years playing a musical instrument across the sample was 5.9 years.

In the first exploratory analysis, we conducted simple bivariate correlations (Pearson’s r) between the number of years that participants reported playing a musical instrument and their score on the Operations Span task. The results showed that for the 54 participants who completed both the Operations Span task and the musical experience questionnaire, there was no relationship between memory span and musical experience, r(54) = .032, p = .86. For the 34 participants whose data was subsequently used in the working memory analysis, there also was no relationship between memory span and musical experience. That finding suggests that listeners with experience playing an instrument were distributed across the working memory span groups.

We then conducted simple bivariate correlations (Pearson’s r) between the number of years that participants reported playing a musical instrument, their score on the Operations Span task, and each of several dependent measures in the timbre judgment task. Those measures included response times and accuracies for the same-, different-, and single-instrument conditions, as well as the size of the interference effects (in terms of both response time and accuracy) calculated from the same- and different-instrument trials. Response times did not correlate significantly with the years playing a musical instrument. However, there was a significant positive correlation between musical experience and accuracy; corresponding coefficients and alpha values (p) are summarized in Table 1 according to instrument condition (single, same, different) and the size of the interference effect (Same–Different). The values in Table 1 show that the correlation between experience and accuracy was strongest in the more difficult different-instrument condition, but also was significant in the same-instrument condition. The overall size of the interference effect for accuracy was negatively related to musical experience (see the right column in Table 1). That suggests that the more experience that a person has playing a musical instrument, the less interference they may experience from the conflicting timbre.

Because of that finding, the basic working memory analysis was rerun using the number of years playing a musical instrument as a covariant in a series of analyses of covariance. The results showed that even when musical experience was factored out, the results remained the same as those described heretofore. There was a main effect of instrument combination for both response accuracy, F(2, 62) = 35.13, p < .0001, and response latencies, F(2, 64) = 5.19, p = .008. Also, the interaction between instrument combination and working memory span remained significant: accuracy, F(2, 62) = 18.85, p < .0001, and latency, F(2, 62) = 4.81,p = .011.


The present results are argued to demonstrate yet another task in which working memory capacity plays a role in performance. Although all the listeners showed some interference from hearing a different instrument in the to-be-ignored ear, the effect was larger for listeners with low working memory spans (the LWM group), which is argued to reflect less working memory capacity. That finding extends previous research insofar as it suggests that working memory is related to executive control of attention in an auditory domain, the processing of musical timbre. The finding also supports the argument that the attentional control processes reflected in measures of working memory capacity may be domain general. In addition, because no word stimuli were presented in the timbre judgment task, the effects of memory span on interference were demonstrated to occur within a single domain of information processing (timbre). Thus, it appears that memory span may have a broad influence on attention by having an impact on performance across modalities, as well as within and across types of processes.

The observed effects of memory span were not obtained simply because listeners in the LWM group were generally worse at judging musical timbre. Working memory span did not predict response times or accuracies for the single-instrument condition. Therefore, when given an isolated tone, the LWM group of listeners was just as successful as the HWM group at focusing on the key dimension and ignoring the irrelevant information of pitch. Furthermore, there were no main effects of working memory span for either response accuracies or latencies.

The major findings also show that our musical timbre judgment task represents an excellent method to examine individual differences in attentional interference. One major benefit of that task is that it avoids the ceiling effects that are often obtained in traditional Stroop tasks. Therefore, there is plenty of room in future studies to evaluate the effects of instructional set, context, and practice on pitch or timbre judgments. Although experience was not explicitly manipulated in the present study, there was some hint that it may play an important role in moderating the size of the interference effect. Those listeners with more experience playing an instrument were somewhat better at the task and tended to have smaller interference effects than did those with less experience. Furthermore, it was independent of the influence of working memory span.

Application of Dual-Mechanism Accounts

The patterns of obtained response times from the present investigation have important implications for theoretical accounts of Stroop-like interference while also revealing some critical differences between tasks. For example, MacLeod (1998) has argued that Stroop interference and Stroop facilitation are independent phenomena. He argued that responses on congruent trials reflect a mixture of very fast trials in which the goal of color naming is forgotten and the word is read (presumably a faster, more automatic process) and trials of slower correct color-naming responses. Of course, in the typical congruent condition, incorrect word naming and correct color naming cannot be dissociated. In contrast, most theories of the Stroop effect suggest that facilitation and interference are both the result of the same process. Researchers presume that interference reflects competition between the two dimensions of color and word, and in the congruent condition, that convergence leads to facilitation.

According to that logic, having the same timbre in both ears may facilitate the decision. However, that was not the case in the present study. Conversely, responses were consistently slower in the same-instrument condition relative to the single-instrument condition (see Figure 2). That lack of facilitation should not be considered surprising. The aforementioned task-irrelevant variation in pitch was doubled in the congruent (same-instrument) condition relative to the corresponding neutral (single-instrument) condition, which should have increased task difficulty in the same-instrument condition. Furthermore, in that condition, there was no reason to expect responding to timbre information to be more automatic for distractor tones. Redundant, and thus, potentially useful timbre information from distractor tones reflected the same domain of information processing as the target tone. Thus, if we momentarily assume that attention was distributed across ears, then the average processing of distractor timbres should not be faster or more accurate than it is for target timbres.

Despite such fundamental differences across tasks, the pattern of interference effects from the present investigation is consistent with claims for dual-processing mechanisms of goal maintenance and the resolution of response competition. By following the theoretical framework provided by Kane and Engle (2003), there were a sufficient number of congruent (i.e., same-instrument) trials in the present investigation to make likely the occasional neglect of task goals. Support for that interpretation comes from the fact that listeners made more errors on incongruent (i.e., different-instrument) trials relative to congruent (same-instrument) trials. The fact that the interference effect was significantly greater in listeners with a low working memory span suggests further that working memory capacity affects the efficiency of that goal maintenance mechanism. Thus, one could argue that those listeners had difficulty actively remembering to respond only to the timbre in the assigned target ear.

An independent mechanism for interference is indicated by the pattern of results for response times. Listeners in the LWM group showed slower response times to incongruent (i.e., different-instrument) trials relative to congruent (same-instrument) trials, a finding that is typical of demonstrations of Stroop interference. However, no such difference was obtained for listeners in the HWM group. Rather, those listeners showed evidence of interference only with the typically less sensitive measure of accuracy. That dissociation between response time and accuracy data therefore suggests distinct processing mechanisms. According to the account by Kane and Engle (2003), it seems that individuals with higher working memory capacity are effective at attending to the response associated with the timbre in the target ear and thereby eliminate the competing alternative response associated with the distractor timbre in the nontarget ear. In other words, there was no additional processing cost associated with the resolution of response competition.

The absence of an interference effect in the response times of the HWM group was not anticipated by the results of Kane and Engle (2003). Rather, they consistently obtained interference effects in response time data regardless of the observer’s working memory span. The null effect from the present dichotic listening task is more consistent with results from Long and Prat (2002), who found that individuals with high working memory spans sometimes do not demonstrate Stroop interference.

Observance of the interference effect may depend on the type and probability of congruency conditions. For example, Kane and Engle (2003) suggested that a corresponding lack of interference in Long and Prat (2002) was because of the inclusion of (a high proportion of) neutral trials, in which a color word was not presented. The present investigation similarly included a type of neutral trial, the single-instrument conditions, which occurred with moderate (.33) probability. Alternatively, the lack of an interference effect for the HWM group of listeners could reflect important processing differences between tasks. Researchers have traditionally argued that interference in Stroop phenomena reflects conflicting responses associated with color and word processing. In contrast, any interference that occurred in our nonverbal dichotic listening task must reflect processing conflicts within a single domain of information processing, timbre. It therefore is possible that listeners with greater working memory capacity can more readily attend to and respond to target information while they ignore distractor information along the same processing dimension(s). It is beyond the scope of the present investigation to evaluate directly the validity of either alternative explanation. However, future research that addresses that issue is warranted and is likely to have important implications for the understanding of working memory.

Overall, the results are consistent with the claim by Engle (2002) that working memory is related to executive control of attentional processes. Clearly, listeners with higher working memory spans, and thus, presumably, greater working memory capacity, were more successful at inhibiting timbre information in the to-be-ignored ear. In previous research with the classic Stroop task, robust individual differences were seen in cases in which goal maintenance was particularly difficult because the proportion of incongruent trials was low. Future research can test whether the effect seen in the present study becomes larger or smaller when the proportion of different-instrument trials is increased or decreased from the 33% used here. If Kane and Engle are correct that working memory constrains attentional inhibition, then individual differences should become less pronounced if only different-instrument trials are used and more pronounced if they are rare.


The present study introduced a new and potentially valuable methodology for the investigation of individual differences in working memory. By generalizing theoretical views of attentional interference from the visual color word Stroop task to this nonword, auditory, selective-attention task, it may be possible to investigate in more depth whether a single attentional mechanism may underlie both phenomena. In addition, studies of individual differences may add to our knowledge of the mechanisms responsible for the relationship of working memory to fluid intelligence.

TABLE 1. Pearson’s Correlation Coefficients (r) and Corresponding

Alpha Values (p) for the Relationship Between Musical Experience

and Performance Accuracy

Instrument condition

Measure Single Same Different Same–Different

r(33) .20 .28 .38 -.30

p .16 .05 .01 .03

Note. Values are displayed as a function of instrument condition

(single, same, different) and the size of the interference effect

(Same – Different).


Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 8, pp. 47-89). New York: Academic Press.

Blasko, D. G. (1999). Only the tip of the iceberg: Who understands what about metaphor? Journal of Pragmatics 31, 1675-1683.

Connine, C. M., Blasko, D. G., & Wang, J. (1994). Vertical similarity in spoken word recognition: Multiple lexical activation, individual differences, and the role of sentence context. Perception and Psychophysics, 56, 624-636.

Conway, A. R. A., Cowan, N., & Bunting, M. F. (2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic Bulletin & Review, 8(2), 331-335.

Conway, A. R. A., Cowan, N., Bunting, M. F., Therriault, D. J., & Minkoff, S. R. B. (2002). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence, 30, 163-183.

Daneman, M., & Carpenter, R A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466.

Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11(1), 19-23.

Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory, and general fluid intelligence: A latent variable approach. Journal of Experimental Psychology: General, 128. 309-331.

Fletcher, H. (1972). Speech and hearing in communication. New York: Krieger.

Gilhooly, K. J., Logie, R. H., Wetherick, N. E., & Wynn,V. (1993). Working memory and strategies in syllogistic-reasoning tasks. Memory & Cognition, 21(1), 115-124.

Hall, M. D. (2001). Auditory feature integration in musical and speech stimuli. Abstracts of the Psychonomic Society, 6, 106.

Hall, M. D., Koch, C., & Griffith, M. (2004). Auditory Stroop-like interference without words. Manuscript submitted for publication.

Iverson, E, & Krumhansl, C. L. (1992). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America, 94(5), 2595-2603.

Just, M. A., & Carpenter, E A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), 122-149.

Kane, M. J., & Engle, R. W. (2003) Working memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132(1), 47-70.

Kendall, R. A. (2001). Music Experiment Development System (Version 2001-A) [Computer software]. Los Angeles, CA: Author,

Kyllonen, E C., & Christal, R. E. (1990). Reasoning ability is (little more than) working memory capacity?! Intelligence, 14(4), 389-433.

Long, D. L., & Prat, C. S. (2002). Working memory and Stroop interference: An individual differences investigation. Memory & Cognition, 30(2), 294-301.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163-203.

MacLeod, C. M. (1998). Training on integrated versus separated Stroop tasks: The progression of interference and facilitation. Memory & Cognition, 26, 201-211.

Melara, R. D., & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch, and loudness. Perception & Psychophysics. 48(2), 169-178.

Newby, H. A. (1979). Audiology. Englewood Cliffs, NJ: Prentice-Hall.

Opolko, F., & Wapnick, J. (1987). McGill University Master Samples [CD-ROM]. Montreal, Quebec, Canada:

Pastore, R. E., Schmuckler, M. A., Rosenblum, L., & Szczesiul, R. (1983). Duplex perception with musical stimuli. Perception & Psychophysics. 33, 469-474.

Pitt, M. (1994). Perception of pitch and timbre by musically trained and untrained listeners. Journal of Experimental Psychology: Human Perception & Performance, 20, 876-986.

Shah, R, & Miyake, A. (1996). The separability of working memory resources for spatial thinking and language processing: An individual differences approach. Journal of. Experimental Psychology: General, 125(1), 4-27.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662.

Turner, M., & Engle, R. (1989). Is memory capacity task dependent? Journal of Memory and Language, 28(2), 127-154.

Manuscript submitted August 27, 2003

Revision accepted for publication February 4, 2004


Department of Psychology

University of Nevada, Las Vegas


Department of Psychology

The Pennsylvania State University at Erie

The authors thank Holly Blasko Drabik and Joshua Rowe for assistance in testing the participants and Matthew Stevenson for assistance with data collection and analysis. The authors also gratefully acknowledge the Executive Editor and anonymous reviewers for their helpful comments.

Address correspondence to Michael D. Hall, Department of Psychology, University of Nevada, Las Vegas, Box 455030, Las Vegas, NV 89154-5030; (e-mail).

COPYRIGHT 2005 Heldref Publications

COPYRIGHT 2005 Gale Group