Sound must be in a digital format for a digital computer to process it.
See Pohlmann's Advanced Digital
Audio for an excellent discussion on analog to digital
conversion of sound.
Digital representations of sound are stored in several different ways
on microcomputers. These representations, whether contained on a
long-term storage medium such as a hard-drive, random-access memory, or
flowing though a computer network, are referred to as file formats.
Sound files begin with a "header." A header consists of information
describing the format of that file. Characteristics such as word
length, number of channels, and sampling frequency are specified so
that audio applications can properly read the file.
Digital sound file formats can have several features, and a number of
different formats have been established, each with different
characteristics. Each sample can vary in accuracy depending on the
length of each digital word. The words, often referred to as bytes, can
be 8-bits long, 16-bits long, etc. The bytes can be signed, or
unsigned, or in floating point format. Generally, larger word length
equals higher fidelity of analog to digital conversion.
Sampling frequency also impacts fidelity. The sampling frequency is
essentially the number of times the sound event is quantized within a
given time period. Sampling frequencies are specified in KiloHertz
(KHz), a term meaning samples per second. The key is understanding how
sampling frequency affects fidelity is the Nyquist sampling theorem.
Basically, when applied to audio signals the Nyquist theorem states
that the highest possible pitch in the sound is one-half that of the
sampling frequency.
For example, "CD-quality" sound requires 16-bit words sampled at 44.1
KHz. Essentially this means 44,100 16-bit words (705,600 bits) are used
to digitally describe each second of sound on a compact disc. The
highest pitch possible is 22.05 KHz (approximately the top of human
hearing range), which is half of 44.1 KHz.
Click here for a text-only version of this page
Additionally, the bytes of a sound file can be represented as linear or
logarithmic progressions. The unit of measurement of the represented
sound pressure is constant from sample to sample in linear encoding,
whereas in logarithmic encoding that unit grows as the sample value
increases. The latter has the advantage of representing a greater range
of sound levels, albeit with higher noise levels. The µ-law and a-law
variations of the AU format, originating from Sun Microsystems and NeXT
Computer, use logarithmic coding. An 8- bit µ-law sample, for example,
can provide the same dynamic level as a 12-bit linear encoded sample.
Word Length (Length of Bytes)
Sampling Frequency
Sampling Frequencies Sampling Frequency (KHz) Common Use 5
7 11 11.1 Minimum
quality currently used on personal computers
22.050 Very common in computer sound file formats 22.354 24 Minimum
acceptable quality needed for speech recognition
32 An option on some professional audio equipment and in the 32
bit floating point IRCAM format (file name suffix of .sf) 44.1 The standard for audio compact discs and high
quality personal computer sound
48 Used
in professional formats, notably Digital Audio Tape (DAT)
Linear and Logarithmic Representation of
Dynamics
Demonstration of Difference | |
---|---|
A linear, 44.1 KHz, 8bit sample | 260K |
A logarithmic, 44.1 KHz, 8bit sample | 260K |
Sound files can express one, two, or more channels of information. The vast majority of formats let you create monophonic or stereophonic files, corresponding with the majority of playback equipment. However, as computer-based audio and video authoring capabilities become more advanced, file formats should grow to describe surround sound and other multi-channel formats.
As shown in the example above, quality sound files comprise a large amount of data. Audio compression reduces the amount of physical storage space and memory required to store a sound, and therefore reduces the time required to transfer a file. Compression can be lossy, meaning the sound quality will be negatively affected by compression, or lossless, meaning there will be no change in sound quality. Common techniques, such as Huffman, MACE, MPEG, and ADPCM compression, are lossy, as they provide a great deal of space savings at a reasonable cost in quality. The only formats that don't employ compression are "raw" audio files and formats such as Apple's sound resource (snd), although the snd format can contain other types of sound representation.
The µ-law (pronounced mu-law) file format, for example, is an international standard for compressing voice quality audio. It has a compression ration of 2:1. The G.721, G.723-24 and G.723-40 ADPCM formats are CCITT standards for compression of 8000-Hz 14-bit samples into a 32-, 24- or 40-kbps data stream. These compressed formats have extremely slow decompression rates. Because it is optimized for speech, in the United States it is a standard compression technique for telephone systems (in Europe, a-law is used). On the Internet it is used for ".au" file formats, alternately know as "Sun audio" or "NeXT" format.
A new compression standard has been proposed by the Interactive Multimedia Association (IMA). The IMA 4:1 audio compression format is intended to compress 16-bit sound with a ratio of 4:1, compressing audio CD-quality sound into one-fourth the space it normally occupies. Apple Computer has integrated IMA 4:1 audio compression into both QuickTime 2.0 and Sound Manager 3.1, and Microsoft has integrated it into Video for Windows.
For more information on digital audio compression, refer to the Digital Equipment Corporation's excellent survey of compression methods. A listing of audio compression hardware can be found at The CERL Sound Group's WWW page.
The most common audio file formats found on the Internet are µ-law (.au), AIFF, WAVE (.wav), Macintosh sound resources (snd), and QuickTime movies (.mov). The single greatest reason for their popularity is cross-platform compatibility and use. Other formats, notably MOD and AIFC files, have attractive characteristics, but have not been widely accepted on UNIX, IBM-PC compatible, and Macintosh platforms. Conversely, newer formats, such as MPEG-compressed audio and video, may become common on the Internet due solely to the comparatively substantial benefits they yield.
Examples of sound files varying in sample rate and word length can be downloaded by selecting one from the table below. The files are in .aiff format, playable by most modern web browsers.
File Format
Examples UNFORTUNATELY THESE FILES WERE LOST WHEN THIS SITE WAS TRANSFERRED FROM THE UNIVERSITY SERVERS. Maybe someday I'll recreate them. | ||||
---|---|---|---|---|
Example | Format | Sampling Frequency (KHz) | Word Length (bits) | Size (bytes) |
1 | AIFF | 11.1 | 8 | 80K |
2 | AIFF | 11.1 | 16 | 152K |
3 | AIFF | 22 | 8 | 144K |
4 | AIFF | 22 | 16 | 272K |
5 | AIFF | 44.1 | 8 | 272K |
6 | AIFF | 44.1 | 16 | 528K |
7 | AIFF | 48 | 8 | 304K |
8 | AIFF | 48 | 16 | 592K |
Next Section:
Transmission of Digital Audio Files:
Conventional Transfer