Audio on The Internet

Perceptual Coding

Use of psychoacoustic principles for the design of audio recording, reproduction, and data reduction devices makes perfect sense. Audio equipment is intended for interaction with humans, with all our abilities and limitations of perception. Traditional audio equipment attempts to produce or reproduce signals with the utmost fidelity to the original. A more appropriately directed, and often more efficient, goal is to achieve the fidelity perceivable by humans. Basically, this means removing the part of an audio signal we cannot hear. This is the goal of perceptual coders.

Although one main goal of digital audio perceptual coders is data reduction, this is not a necessary characteristic. Perceptual coding can be used to improve the representation of digital audio through advanced bit allocation. Also, all data reduction schemes are not necessarily perceptual coders. Some systems, the DAT 16/12 scheme for example, achieve data reduction by simply reducing the word length, in this case cutting off four bits from the least-significant side of the data word, achieving a 25% reduction.

The Digital Compact Cassette (DCC), developed by Philips, is one of the first commercially available forms of perceptually coded media. It achieves a 25% data reduction through the use of the Precision Adaptive Sub-band Coding (PASC) algorithm. The algorithm contains a psychoacoustical model of masking effects as well as a representation of the minimum hearing threshold. The masking function divides the frequency spectrum into 32 equally spaced bands. Sony's ATRAC system for the MiniDisc format is similar.

Perceptual coders still have room for improvement but are headed in what seems to be a more intelligent direction. The algorithms are not perfect models of human perception and cognition. Of course, while the modeling of a perceptual coder could be over-engineered in the spirit of cognitive science in order to learn more about human cognition, all that is necessary in perceptual coding is to develop an algorithm that operationally corresponds to human auditory perception, not one that physically copies it.

The Future of Perceptual Coding

It is probable that all future coding schemes that make any claim to sophistication will make use of psychoacoustical principles. While the present commercial systems, PASC and ATRAC, were instituted in the interest of economy of storage, there are other valuable functions for perceptual coders. Transfer over networks, presently a time-consuming function when sending large, high-quality audio files, is a prime example of where perceptual coding is needed. Consider a case with a relatively fast connection to the Internet: a T1 line, able to transfer data at approximately 1.3MB/sec., requires almost three minutes to send five minutes of CD-quality stereo digital audio. Assuming the 25% efficiency of PASC, the same amount of digital audio could be sent in under two minutes. Additionally, the perceptually coded material may sound better if dynamic bit allocation were used. If the coding was performed in real time, as some are, then the speed of transfer between the central processing units and the Internet connections at each sending and receiving point would also be increased.

Other applications include stand-alone converter modules for conversion to any media and, eventually, software encoders. The need for standardization soon becomes apparent, and hopefully it will be met.

The long explanation of masking and perceptual coding

Next Section:
Future audio and Internet developments: Client-Server Systems

Audio on The Internet