Audio on The Internet

FILE FORMATS

Word Length
Sampling Frequency
Linear and Logarithmic Representation of Dynamics
Channels
Compression
File Format Examples

Sound must be in a digital format for a digital computer to process it. See Pohlmann's Advanced Digital Audio for an excellent discussion on analog to digital conversion of sound.

Digital representations of sound are stored in several different ways on microcomputers. These representations, whether contained on a long-term storage medium such as a hard-drive, random-access memory, or flowing though a computer network, are referred to as file formats.

Sound files begin with a "header." A header consists of information describing the format of that file. Characteristics such as word length, number of channels, and sampling frequency are specified so that audio applications can properly read the file.

Word Length (Length of Bytes)

Digital sound file formats can have several features, and a number of different formats have been established, each with different characteristics. Each sample can vary in accuracy depending on the length of each digital word. The words, often referred to as bytes, can be 8-bits long, 16-bits long, etc. The bytes can be signed, or unsigned, or in floating point format. Generally, larger word length equals higher fidelity of analog to digital conversion.

Sampling Frequency

Sampling frequency also impacts fidelity. The sampling frequency is essentially the number of times the sound event is quantized within a given time period. Sampling frequencies are specified in KiloHertz (KHz), a term meaning samples per second. The key is understanding how sampling frequency affects fidelity is the Nyquist sampling theorem. Basically, when applied to audio signals the Nyquist theorem states that the highest possible pitch in the sound is one-half that of the sampling frequency.

For example, "CD-quality" sound requires 16-bit words sampled at 44.1 KHz. Essentially this means 44,100 16-bit words (705,600 bits) are used to digitally describe each second of sound on a compact disc. The highest pitch possible is 22.05 KHz (approximately the top of human hearing range), which is half of 44.1 KHz.

Sampling Frequencies (KHz) and the common use, respectively.

5
7
11
11.1 -> Minimum quality currently used on personal computers
22.050 -> Very common in computer sound file formats
22.354
24 -> Minimum acceptable quality needed for speech recognition
32 -> An option on some professional audio equipment and in the 32 bit floating point IRCAM format (file name suffix of .sf)
44.1 -> The standard for audio compact discs and high quality personal computer sound
48 -> Used in professional formats, notably Digital Audio Tape (DAT)

Linear and Logarithmic Representation of Dynamics

Additionally, the bytes of a sound file can be represented as linear or logarithmic progressions. The unit of measurement of the represented sound pressure is constant from sample to sample in linear encoding, whereas in logarithmic encoding that unit grows as the sample value increases. The latter has the advantage of representing a greater range of sound levels, albeit with higher noise levels. The µ-law and a-law variations of the AU format, originating from Sun Microsystems and NeXT Computer, use logarithmic coding. An 8- bit µ-law sample, for example, can provide the same dynamic level as a 12-bit linear encoded sample.

Demonstration of Difference
A linear, 44.1 KHz, 8bit sample (.au, 260K)
A logarithmic, 44.1 KHz, 8bit sample (.au, 260K)

Channels

Sound files can express one, two, or more channels of information. The vast majority of formats let you create monophonic or stereophonic files, corresponding with the majority of playback equipment. However, as computer-based audio and video authoring capabilities become more advanced, file formats should grow to describe surround sound and other multi-channel formats.

Audio Compression

As shown in the example above, quality sound files comprise a large amount of data. Audio compression reduces the amount of physical storage space and memory required to store a sound, and therefore reduces the time required to transfer a file. Compression can be lossy, meaning the sound quality will be negatively affected by compression, or lossless, meaning there will be no change in sound quality. Common techniques, such as Huffman, MACE, MPEG, and ADPCM compression, are lossy, as they provide a great deal of space savings at a reasonable cost in quality. The only formats that don't employ compression are "raw" audio files and formats such as Apple's sound resource (snd), although the snd format can contain other types of sound representation.

The µ-law (pronounced mu-law) file format, for example, is an international standard for compressing voice quality audio. It has a compression ration of 2:1. The G.721, G.723-24 and G.723-40 ADPCM formats are CCITT standards for compression of 8000-Hz 14-bit samples into a 32-, 24- or 40-kbps data stream. These compressed formats have extremely slow decompression rates. Because it is optimized for speech, in the United States it is a standard compression technique for telephone systems (in Europe, a-law is used). On the Internet it is used for ".au" file formats, alternately know as "Sun audio" or "NeXT" format.

A new compression standard has been proposed by the Interactive Multimedia Association (IMA). The IMA 4:1 audio compression format is intended to compress 16-bit sound with a ratio of 4:1, compressing audio CD-quality sound into one-fourth the space it normally occupies. Apple Computer has integrated IMA 4:1 audio compression into both QuickTime 2.0 and Sound Manager 3.1, and Microsoft has integrated it into Video for Windows.

For more information on digital audio compression, refer to the Digital Equipment Corporation's excellent survey of compression methods. A listing of audio compression hardware can be found at The CERL Sound Group's WWW page.

Common File Formats on the Internet

The most common audio file formats found on the Internet are µ-law (.au), AIFF, WAVE (.wav), Macintosh sound resources (snd), and QuickTime movies (.mov). The single greatest reason for their popularity is cross-platform compatibility and use. Other formats, notably MOD and AIFC files, have attractive characteristics, but have not been widely accepted on UNIX, IBM-PC compatible, and Macintosh platforms. Conversely, newer formats, such as MPEG-compressed audio and video, may become common on the Internet due solely to the comparatively substantial benefits they yield.

Examples of sound files varying in sample rate and word length can be downloaded by selecting one from the table below. The files are in .aiff format, playable by most modern web browsers.

File Format Examples

Example--Format--Sampling Frequency (KHz)--Word Length (bits)--Size (bytes)
1--AIFF--11.1--8--80K
2 --AIFF--11.1--16--152K
3 --AIFF--22--8--144K
4 --AIFF--22--16--272K
5 --AIFF--44.1--8--272K
6-- AIFF--44.1--16--528K
7 --AIFF--48--8--304K
8 --AIFF--48--16--592K

Audio on The Internet