Tonal Schemata and Their Role in Streaming

Victor Lombardi
May 5, 1993
Psychology of Music
Dr. Robert Rowe



Attention determines experience. All that we consciously experience in life depends upon what we perceive and how closely we attend to those perceptions. When asleep we experience very little because our senses are largely inactive. Even salient events may be side-stepped: the runner can ignore the pain in her muscles by concentrating her attention on the goal of reaching the finish line. Likewise, the music we hear is a subset of the music around us. We do not attend to the odious musical offering of every passing automobile, for instance. Furthermore, the set of music we choose to concentrate on is diminished further by the limitations of our cognitive operations.

Music sometimes contains complexity too great to be perceived, much less remembered, on a first hearing. Barring the overwhelmingly manifold compositions of post-World War II composers, the effective perception of the heterogeneous and intricate qualities of music is possible, but demands that attention be selective. Repeated listening makes possible the appreciation of diverse aspects by way of redirected attention. The quantity and quality of this attention is directly related to the quantity and quality of perception, and eventually, of the memory of that perception.


Our memory of a melody, once perceived, can be stored in different ways. Memorization of a melody requires the storage of precise measurements, such as key, starting pitch, intervals, and rhythm. Storage of these aspects in a less discrete form creates a higher-order representation that still allows recognition of a melody, without the conscious effort of memorization. This more general representation is termed a schema.

The idea of the existence of schemata dates back to the philosopher Emmanuel Kant, who theorized that representations of sensory material were categorized with the aid of the imagination (Watson 1927). This definition is more ambiguous than inaccurate, as the semantics have only evolved to become more specific since Kant. During the first half of the twentieth century the psychologist Frederick Bartlett popularized the term through an influential definition of schemata as involving a series of organized responses corresponding to a series of stimuli (Bartlett 1932). A usable definition of the contemporary meaning is offered by the cognitive psychologist Jean Mandler. Mandler describes a schema as a knowledge structure "formed on the basis of past experience with objects, scenes, or events and consisting of a set of (usually unconscious) expectations about what things look like and/or the order in which they occur (Mandler 1979)." Gjerdingen (Gjerdingen 1988) points out that these expectations are both activated by and concerned with higher-level events.

Schemata can represent information of various types on different levels and can be embedded, one within another (Rumelhart 1980). Specifically, I am concerned with the relationships of pitch within a melody. Tonal schemata are formed through repetitious experience with melodic material and are a type of long term memory. The word tonal is used as opposed to rhythmic or harmonic, not atonal - an important distinction - as we will can use atonal material to gauge the use of tonal schemata. The construction of tonal schemata apparently involves two main elements: contour and chroma.

Using atonal melodies, Dowling and Fujitani (Dowling & Harwood 1986) had listeners distinguish between transpositions, altered-interval imitations, and contour changes. Only when the contour changed did the listeners easily distinguish between the melodies, achieving between 85% and 90% correct. Later the experiment was repeated with tonal melodies with similar results. While tonality eased the differentiation of transpositions from the original, contour still emerged as the most obvious differing feature. From this evidence we can conclude that our tonal schema includes a depiction of contour.

Contour alone would not provide sufficient information to recall a melody. Dowling and Harwood find evidence that suggests we store pitches "as a sequence of abstracted chromas (i.e., do-re-mi labels in a movable-do system) (Dowling & Harwood 1986)." While those of us with absolute pitch may store the key of a tune, the key is not a necessary part of the tonal schema. Chromas describe scale position without using specific frequencies. While Deutsch (1972) found that melodies with notes randomly transposed either up or down an octave are difficult to recognize (65% success), Kallman & Massaro (1979) found that melodies with chromas transposed by 1 or 2 semitones will resemble the original even less (about 10% success). It would seem that scale position is vital to the representation, and that we must include chroma in our depiction of tonal schemata.

Recognition and Streaming

Once a schema exists it can be used for tasks of recognition. Once the attention is focused, a schema permits recognition of a melody without even knowing what particular melody to listen for (what schema to use in matching) through simple association. Experiments by Dowling (Dowling & Harwood 1986) have shown that tonal schemata and selective attention can be used to parse a sequence of tones. In the experiment, familiar nursery rhymes and folk tunes were interleaved in time, resulting in an unrecognizable sequence of tones. A feature of one melody, either stereophonic panning, loudness, pitch range, or timbre, was altered so as to distinguish it from the other melody. The listeners, by focusing attention on one melody, were able to ‘hear out’ the melody from the sequence and ignore the other. The listeners were still aware of the competing material, but since it only appeared in what this author terms the peripheral hearing, there was not sufficient attention to activate the schema. Considering this experiment, we may posit that this phenomenon is a form of categorical perception, such that we assign incoming melodic material to the category of a stream or to the category of peripheral material.

This focusing of attention is referred to in the literature as streaming, and the event or collection of events attended to are the stream. Bregman (1990) provides us with a fine development of the term auditory stream as a "perceptual unit that represents a single happening." A happening may be a collection of sounds and, indeed, a collection of different sounds distinct from others in the environment. Bregman takes delight in the use of "stream" over "sound" as an effective term for the above reasons, but also because it gives him the flexibility to "load it up with whatever theoretical properties seem appropriate." It only apt for him to use a different word for a different concept.

For example, we can consider as one stream the many diverse sounds produced by an orchestra. This stream is separate from the sounds of the audience, which would constitute another stream. We can willfully aim our attention to one or the other, but attending to both is difficult if not impossible. When describing a similar action of visual perception, Gestalt psychologists have named this phenomenon the Gestalt switch. The relationship of Gestalt theory is appropriate here, for we can think of a schema as a pattern, the core of Gestalt thought.

We can continue this analogy and consider streams the auditory equivalent of visual objects. Streams are the things we examine, and to streams we attribute properties. The study of the perception of streams is elemental to knowledge of auditory cognition. Bregman (1990) employs the stream as "a computational stage on the way to a full description of an auditory event. The stream serves the purpose of clustering related qualities. By doing so, it acts as a center for our description of an acoustic event."

The Experiment

An attempt was made to reproduce the recordings used to test streaming that accompany Dowling’s book Music Cognition. At times the sound quality of Dowling’s recording is less than ideal, so this author has set out to reproduce the sine wave sequences sans distortion. A listing of these can be found in Appendix A. Wishing to confirm the premise of such an interesting phenomenon, the experiment was repeated. Additional examples were constructed for the sake of variety, so that once a schema was formed for one sequence another sequence could be used so the listener would not always identify the target stream. The musical notation of these can be found in Appendix B.

A different approach was taken to the present experiment. Dowling prepared his listeners with "true and false labels" before they heard the sequences (Dowling 1986). By supplying them with expectations he demonstrated that the listeners could filter out a stream from a previously unrecognizable melody once a schema was in place (65% success with a true label, 0% success with a false label). This author discarded the labels in order to gain a more realistic, if less scientific, impression of how well an ordinary person uses schemata on an ordinary basis. The goal was to find the point at which a schema was formed (through repetition) and could then be used to sufficiently identify a stream. Reflection on the results can lead to clues as to how the use of schemata occurs in everyday musical situations, such as parsing a folk melody from a symphonic work.

The sequences used here, unless noted, were created in the manner of Dowling without differentiating aspects (loudness, panning, timbre, or pitch range). The listener was prepared for the sequence with the simple command, "Listen to this sequence of tones and tell me if you hear a familiar melody." With one exeption, the listener’s reactions to one sequence did not follow Dowling’s findings, that of Yankee Doodle interleaved with Old McDonald. Listeners usually heard Yankee Doodle amongst some interference tones; no one heard Old McDonald within the sequence until the latter had been played by itself at least three times. Even then listeners tended to hear the Yankee Doodle stream. The one exception, an elementary school teacher who had recently taught Old McDonald, heard the latter, presumably due to her recently reinforced schema. Several theories have been offered to explain this result.

First it was suggested that Yankee Doodle occurs on the down beat, which may have greater priority for attention than the upbeat, and starts the sequence (being on the downbeat), so it has the first opportunity to attract our attention. Another suggested that Yankee is more dynamic and therefore more interesting and therefore, according to information theory, more likely to attract our attention because it would be in our best interests, evolutionarily speaking. If we hear both streams subconsciously and choose Yankee, this theory might hold, but it seems more likely that we select a melody from the sequence rather than choosing between the two melodies. This author’s personal theory is that Old McDonald, in the first bar at least, takes on the identity of Yankee Doodle by resembling a counter-melody. The ‘information theory’ theory provides this author with a clue: Because Old McDonald is rather static relative to Yankee Doodle, Old McDonald may be cognitively categorized as musical accompaniment.

The experiments involving the other two sequences of two interleaved melodies (see Appendix A) agreed with Dowling’s basic results. Without labels the listeners could not hear a familiar melody, coinciding with Dowling’s results for a false label. When one melody was revealed, it could be partially parsed from the whole; after about three repetitions the entire melody was easily parsed. When the second melody was revealed, however, there was difficulty forming a schema, presumably because of superior competition from the first schema. Even after three solo repetitions of the second tune there was difficulty in parsing it from the sequence. After five or more solo repetitions parsing of the second tune could be achieved, and selection of either stream could be accomplished at will.

Another question was addressed here: Could one melody, given some differentiating factor, be separated from two competing melodies? The results indicate that this is possible. Three melodies were interleaved and, alternately, one was assigned a different timbre. The sequence without a timbre difference could not be parsed into streams. With the timbre difference the sequence could be parsed, though not as easily. With two competing melodies for interference the overall effect of the peripheral material increased. Parsing was not more difficult, but the interference became more annoying, as if it were actively fighting for the listener’s attention. The easiest melody to hear as a stream was the third, which has the widest intervals and covers a larger pitch range.


Clever use of streams could serve as a provocative compositional tool. Certainly, repetitions of a melody within a composition is an oft used method of creating a recognizable theme, which can then be used as a leitmotive, for example, or on the higher level of form. Whether attracting attention with schemata or by some salient feature, attention can easily be directed across the spectrum of musical activity of a piece. This shifting focus adds excitement and variety. Of course, providing several levels of activity and allowing the listener independently enjoy each stream sustains interest over repeated listenings. Use of schema and streams can be considered a useful tool both for the listener and the composer.

Appendix A

Recorded Examples

1. a. Yankee Doodle/Old McDonald: tones are sine waves, all parameters of the two melodies are equal.

b. Yankee Doodle played solo.

c. Old McDonald played solo.

d. Repeat a.

2. a. Here We Go Round the Mulberry Bush/Mary Had a Little Lamb/Where oh Where Can My Little Dog Be?: piano timbre, all parameters are equal.

b. Here We Go... played with a vibraphone timbre.

c. Mary Had...played with a vibraphone timbre.

d. Where oh where...played with a vibraphone timbre.

e. Repeat of a.

3. a. London Bridge is Falling Down/Twinkle Twinkle Little Star: piano timbre, all parameters equal.

b. London Bridge played solo.

c. Twinkle Twinkle played solo.

d. Repeat of a.


Bartlett, Frederick, C. Remembering: A Study in Experimental and Social Psychology. New York: Cambridge University Press, 1932.


Bregman, Albert S. Auditory Scene Analysis: The Perceptual Orgnization of Sound. Cambridge, MA: MIT Press, 1990


Deutsch, D. "Octave Generalization and Tune Recognition." Perception and Psychophysics. 1972, No. 11.


Dowling, W. Jay and Dane L. Harwood. Music Cognition. Orlando, FL: Acedemic Press, 1986.


Gjerdingen, Robert O. A Classic Turn of Phrase: Music and the Psychology of Convention. Philadelphia: University of Pennsylvania Press, 1988.


Kallman, H.J. and Massaro, D.W. "Tone Chroma is Functional in Melody Recognition." Perception and Psychophysics. 1979, No. 26.

Mandler, Jean Matter. "Catagorical and Schematic Organization in Memory." Memory Organization and Structure. New York: Acedemic Press, 1979.


Rumelhart, David E. "Schemata: The Building Blocks of Cognition." Theoretical Issues in Reading Comprehension. Hillsdale, NJ: Lawrence Erlbaum Associates, 1980.


Watson, John. The Philosophy of Kant as Contained in Extracts from His Own Writings. Glosgow: Jackson, Wylie, and Company, 1927.