Use of psychoacoustic principles for the design of audio recording,
reproduction, and data reduction devices makes perfect sense. Audio
equipment is intended for interaction with humans, with all our abilities
and limitations of perception. Traditional audio equipment attempts to
produce or reproduce signals with the utmost fidelity to the original. A
more appropriately directed, and often more efficient, goal is to achieve the
fidelity perceivable by humans. Basically, this means removing the part of an audio signal we cannot
hear. This is the goal of perceptual coders.
Although one main goal of digital audio perceptual coders is data
reduction, this is not a necessary characteristic. Perceptual coding can be
used to improve the representation of digital audio through advanced bit
allocation. Also, all data reduction schemes are not necessarily perceptual
coders. Some systems, the DAT 16/12 scheme for example, achieve data
reduction by simply reducing the word length,
in this case cutting off four
bits from the least-significant side of the data word, achieving a 25%
reduction.
The Digital Compact Cassette (DCC), developed by Philips, is one of the first
commercially available forms of perceptually coded media. It achieves a
25% data reduction through the use of the Precision Adaptive Sub-band
Coding (PASC) algorithm. The algorithm contains a psychoacoustical
model of masking effects as well as a representation of the minimum
hearing threshold. The masking function divides the frequency spectrum
into 32 equally spaced bands. Sony's ATRAC system for the MiniDisc
format is similar.
Perceptual coders still have room for improvement but are headed in what
seems to be a more intelligent direction. The algorithms are not perfect
models of human perception and cognition. Of course, while the modeling
of a perceptual coder could be over-engineered in the spirit of cognitive
science in order to learn more about human cognition, all that is necessary
in perceptual coding is to develop an algorithm that operationally
corresponds to human auditory perception, not one that physically copies it.
The Future of Perceptual Coding
It is probable that all future coding schemes that make any claim to
sophistication will make use of psychoacoustical principles. While the
present commercial systems, PASC and ATRAC, were instituted in the
interest of economy of storage, there are other valuable functions for
perceptual coders. Transfer over networks, presently a time-consuming
function when sending large, high-quality audio files, is a prime example of
where perceptual coding is needed. Consider a case with a relatively fast connection
to the Internet: a T1 line, able to transfer data at approximately
1.3MB/sec., requires almost three minutes to
send five minutes of CD-quality stereo digital audio. Assuming the 25%
efficiency of PASC, the same amount of digital audio could be sent in under
two minutes. Additionally, the perceptually coded
material may sound better if dynamic bit allocation were used. If the coding
was performed in real time, as some are, then the speed of transfer between
the central processing units and the Internet connections at each sending and
receiving point would also be increased.
Other applications include stand-alone converter modules for
conversion to any media and, eventually, software encoders. The need for
standardization soon becomes apparent, and hopefully it will be met.
The long explanation of masking and perceptual coding
Next Section:
Future audio and Internet developments:
Client-Server Systems