theme song (400K mp3)

  Investigating the Implementation of Audio in Human-Computer Interfaces

Victor Lombardi


My research explored the uses of non-speech audio feedback in Human-Computer Interaction. A number of studies have shown how audio contributes to the interaction process in order to provide a richer, more robust environment than with mere graphic feedback. Auditory feedback can present further information when the bandwidth of graphic information has been exhausted, as is often the case with the present emphasis on graphic presentation. By expanding conventional interfaces in another dimension, sounds make tasks easier and more productive. Other studies have even shown certain types of information to be represented better by sound than through graphics or text. Additionally, audio feedback may complement graphics and text to create valuable redundancy, reinforcing or reconfirming a concept in the user's mind.

Many systems use robust audio feedback with great success. The telephone is a very general example that uses audio feedback exclusively, at least until the introduction of the video phone. Perhaps the most intense interactive systems in general use are for playing video games. Despite their non-professional nature, they use audio feedback extensively to aid in fast-paced interaction. It has been shown that performance in such applications drops when the audio feedback is removed (Crispien 1993). Perhaps a user could wield a business or scientific application with greater speed and ease of use with audio feedback. In doing so, the productive process might become more enjoyable as well.

Audio also promises to bridge the gap between sight-impaired users and graphic user interfaces (GUIs). It is ironic that the desktop/windows systems on personal computers have enable so many persons to become computer literate and yet have abandoned the blind, who in the past have been productive using text-based interfaces and refreshable Braille modules (Crispien 1993). Various groups, the Mercator Project at the Georgia Institute of Technology and the GUIB project funded by the European Community, are working on audible complements for the desktop metaphor. Emphasis has been placed on determining the best sound to correspond with each action, finding sounds which work effectively in combination, and developing a system of localization so that sounds are correctly associated with their respective graphic object.

Yet, as Braille is not a literal physical representation of characters, perhaps a metaphor for the blind should not be a literal representation either. Menus and windows are not objects which the blind relate to, so why try to make their use audible? The blind do have a rich auditory experience that could provide for a unique metaphorical environment.

Foreseeable Problems

Because our sense of hearing is less directional than our sense of sight, and because we lack "earlids," audio will affect others present in the same environment as the computer system. A method of continuous sound, such as repetitive music, has been proposed to create a sort of drone with intermittent salient features to provide information, hopefully with less annoyance to on-listeners. Another possibility is a sort of active level sensing that would alter the volume of the system depending on the level of environmental noise, using a real-time limiting/expanding algorithm.

While a system employing headphones would cure this problem, many users do not like the feel of being "hooked-up" to the computer, and so prefer not to wear headphones.

It seems reasonable to assume that the sounds used to represent a particular event will be a factor in whether the sounds are effective or not. As pointed out above, one criteria for selection of sounds could be consistency with the physical world. We could start by making a button's sound consistent with a button's function. As each type of button in real-life has its own texture and resistance, each type of software button could have its own sound. As familiarity with the sound grows, the user will associate a sound not only with its respective button, but also with the button's function. So not only will the user know that the click they executed was registered, they will also know, with utter immediacy, what action was performed - without having to read a label or other description within a dialog box. Graphics and text could then be used to provide other information, thereby widening the effective bandwidth of conveyed information.

Several other interesting questions arise which the literature has not yet answered. Assuming a pointing-device driven interface:

In many GUIs, when a graphic button is "pushed," it momentarily switches to inverse-field to show the user that his/her choice has registered.

Question: Would the addition of sound to this event provide redundant information and therefore better feedback? Or would this sound merely be superfluous and/or annoying?

Question: Could we replace the inverse-field effect with a sound and still have equally effective feedback? Would this be in keeping with most real-life buttons, which not always change appearance when depressed (as is the norm in graphical user-interfaces), but often offer audio feedback.

I have started to continue this line of thought to include continuous actions, such as scrolling. These actions would call for a continuous sound, a loop for example, to represent the continued action of depressing the button on a pointing device.

On the grand scale, it would be nice to invent a metaphor to encompass all the sounds used. If all the sounds were related to a single activity such as carpentry, or baking, or the sounds of a farm, the sounds would take on further meaning. Not only would they be associated with a particular function, but they would also be associated with their respective metaphoric action, and therefore the sounds would have meaning relative to one another, because the analogous actions within an activity have meaning relative to one another.

Once the interface designer has determined what sound is most appropriate, that sound must still be created or sampled somehow. It has been pointed out that most synthesis programs that exist today are intended to aid the musician, not the interface designer. An application does not exist yet that allows one to specify parameters in plain language to create sounds which represent a particular object or action. Instead of the intuition, "I want my sound more woody," or "I want it more metallic", you must use various technical means in an experimental manner to achieve such results.


Crispien, Kai, and Helen Petrie. Providing Access to GUIs for Blind People Using a Multimedia System Based on Spatial Audio Presentation. Preprint from the 95th Convention of the Audio Engineering Society. New York: Audio Engineering Society, 1993.

Gaver, William W., et al. Effective Sounds in Complex Systems: The ARKola Simulation. Proceedings of ACM CHI'91 Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery, 1991.

Gaver, William W. moderator of panel. Perceptual vs. Hardware Performance in Advanced Acoustic Interface Design. Proceedings of ACM CHI'93 Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery, 1993.

Gaver, William W. Synthesizing Auditory Icons. Proceedings of ACM INTERCHI'93 Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery, 1993.

Mountford, S. Joy and William W. Gaver. Talking and Listening to Computers. The Art of Human-Computer Interface Design. Laurel, Brenda, ed. Reading, MA: Addison Wesley, 1990.

Mynatt, Elizabeth D. and W. Keith Edwards. Mapping GUIs to Auditory Interfaces. Proceedings of the ACM SIGGRAPH Symposium on User Interface Software and Technology (UIST '92). New York: Association for Computing Machinery, 1992.

Tognazzini, Bruce. Tog on Interface. Reading, MA: Addison-Wesley, 1992.

Various communication forums on the Internet concerned with computer-human interaction issues provided under the auspices of the Association for Computing Machinery.


Search this site:

home page

  Copyright 1999-2002 Victor Lombardi