Creating digital audio

Understanding digital audio

In digital audio recording there are three basic elements involved in the creation of a file: the type of sound signal (audio stream), the type of file format that the sound samples are stored in (container), and the way they are encoded into that format (codec). A codec is a piece of software (and in some cases hardware) used to encode sound signals into a format and enable decoding back to a sound signal you can hear. Compatibility issues with sound files between different software applications can be due to differences in the type of sound signal, the file format, or the codec.

As with digital image capture, audio files can be recorded uncompressed, or by using lossless compression or lossy reduction. Three widely used format standards that reflect each of these categories are WAV (uncompressed), FLAC (lossless compression), and MP3 (lossy reduction). Compact Discs (CDs) use their own specialised format standard, referred to as Red Book audio.

While analogue sound waves are continuous, digital sound consists of a series of discrete samples. The quality is set by the combination of the bit depth (the number of bits of information able to be recorded) and the sample rate (the number of times per second the information is able to be measured). The standard for CDs is 16-bit audio at a sample rate of 44,100 samples per second (44.1 kHz) in stereo, which when multiplied gives a bit rate of 1,411 kilobits per second. As a result of lossy reduction, where bits are thrown away in the process, a standard MP3 file at 128 kilobits per second contains only about one tenth of this information.

High-quality uncompressed or lossless audio is the only way to retain close to the level of information available in an analogue wave form, which is why these formats are used for master copies and audio archiving and preservation. A common archiving standard today is 24-bit audio at 48 kHz, creating a bit rate of 2,304 kilobits per second. Professional archivists may use a higher standard again - beyond that audible by the human ear - when recording or digitising from an analogue source, in order to be confident they have captured as much of the original wave information as possible. The trade-off is that this creates very large file sizes to manage.

An analogue sound wave digitally sampled in 4 bits.

Digital file formats

Uncompressed

The uncompressed WAV format is highly suitable for recording and editing, and is able to be created and used in almost all recording and editing software. The main limitations of WAV are the inability to embed descriptive metadata into the file, and the maximum file size of 4 Gigabytes. Professional sound engineers and archivists use extended formats of WAV, known as the Broadcast Wave Format and RF64, to overcome these limitations. An alternative is to use a software application to manage the metadata when editing or playing back. You can also store the descriptive metadata in plain-text files with the sound file when archiving. File size can be managed by using software that splits a long recording into consecutive files, or by limiting the bit rate of the recording.

Lossless compression

The compressed FLAC format is lossless, meaning that when sound samples are encoded and then decoded, no information is lost. FLAC is also able to embed metadata. FLAC files are most useful for storage and file management as, in addition to allowing embedded metadata, the compressed files are usually about half the size of a WAV file. The limitations of FLAC are largely to do with the number of software applications that currently support the format. This may also limit the usefulness of the metadata FLAC supports.

Lossy reduction

Applying lossy reduction, the MP3 format is most useful for providing temporary or readily accessible copies of existing digital audio. Lossy reduction uses algorithms to calculate which bits of information can be discarded with the least impact on the sound. The MP3 algorithm is not as efficient as newer lossy formats, but in most cases the differences are minor. The important differences are in the reduced sound quality compared to uncompressed or lossless audio sources, and the difficulty editing or remixing MP3 sources. The most popular open lossy alternative to MP3s is Vorbis, supported by the same foundation that supports FLAC. An advantage of Vorbis is the format is playable on open-source platforms such as Linux.

Digital audio recording and editing

The first consideration when planning a digital audio project is the source material. Generally you will be either copying from an analogue recording or a digital recording or source; or recording sound from a real-world event. Each of these sources requires a different recording and editing strategy.

Copying from an analogue source

If you are copying from an analogue source, such as a cassette tape, you need to convert the sound to digital using a device called a DAC (digital-to-analogue converter). They can be computer sound cards, built into microphones and amplifiers, or can be separate boxes placed in between your source and your recording device.

As the source is analogue, it is desirable to record the best copy possible, which means using the uncompressed WAV format. If possible, record at 24-bit 48-kHz stereo quality. You can always convert a copy of the master file you create to a more accessible format afterwards. More detail on copying from cassette is available in our response about transferring oral histories from cassette.

Copying from a digital source

When you copy from a digital recording, the software settings you use will depend on the original digital format. If you are copying from a CD, there is no value in recording at a quality level higher than the source material (which is 16-bit 44.1-kHz stereo), as, unlike with analogue, no additional information can be gained. If you have the option, copy the disc at the slowest speed, enable error correction, and avoid doing anything else on your computer that may bump your drive or cause your processor activity to surge. This decreases the chance of accidentally introducing errors into your new copy.

For computer-based files, bit-for-bit copying is preferable in most cases. If you need to shift formats, the quality of the result will depend on whether you are copying from an uncompressed or lossless format, or from a lossy reduced format. WAV and FLAC files (along with AIFF, Apple lossless, and Windows Media Audio lossless) are easily copied and converted into virtually any other format using freely available software.

In contrast, lossy reduction results in an irreversible loss of information from the sound sample, and while the lossy algorithms can do a fairly good job the first time, shifting formats from one lossy format to another can make audio unlistenable. If you absolutely have to convert a lossy reduced file such as an MP3, convert it to a lossless or uncompressed format to prevent further information loss. To ensure you are not degrading your audio beyond an acceptable level of audio loss, always quality-check the output using the best pair of headphones or speakers you can find. In addition, keep a copy of the file that you are converting so that if necessary you can re-do the conversion at a later point.

Recording a real-world event

There are a wide variety of professional and semi-professional options available using sound boards, studio-quality microphones, and sophisticated software. For those with more limited funds and skills, there will always be a trade-off. Fortunately, the rise of podcasting has led to a dramatic improvement in the range of techniques and equipment easily accessible for digital recording.

As with pre-recorded analogue material, it is desirable to record an event to the highest quality possible. The two things that will have the greatest effect on the quality of the recording will be the format used to record the event and the microphone. At a minimum you need a recorder that records in CD-quality WAV, AIFF, or lossless FLAC formats, and if possible an external microphone with a stand. There are a number of specialised portable recorders used by radio journalists, interviewers, and researchers that are ideally suited to basic good-quality recording of live events. If you can use an external microphone, use the best quality you can afford. Condenser microphones are best, but even a decent dynamic microphone will be a great improvement over many built-in microphones. If these recorders are out of your reach financially or cannot be borrowed, there are two other options currently that you might consider:

  • later generation Apple iPods have the capacity to record 16-bit WAV files with a special external microphone attachment. Some of these attachments, such as one made by Belkin, also allow you to attach a full-size external microphone and connect to external power. This can be a very cost-effective way of recording a live event or interview.

  • using a small laptop or a netbook with AC power and a USB microphone can also be a cost-effective way of recording a live event, using free software like Audacity. USB microphones are often advertised as 'suitable for podcasting' and can be plugged directly into the PC without an external soundcard. Avoid using a 3.5mm microphone jack plugged into your computer's soundcard, as the jack and cable will pick up a lot of electrical noise from the PC that will interfere with your recording.

A USB microphone and netbook can be a cost-effective way of recording events.

Recorders like dictaphones are not designed to produce a lasting recording and record at a very low bit rate using lossy formats such as MP3 and WMA. The results from using a dictaphone will be disappointing, and in some cases impossible to save to a long-term accessible digital format.

If you are recording using a microphone, always switch off any mobile phones, as their radio antenna can interfere with the recording.

Standards for digital audio recording and editing

CD quality or higher audio standard can be achieved with relative ease and fairly basic equipment and software. While there is no ideal encoding standard for lossy formats, in the table below we have provided two minimum lossy standards for access purposes, both being close to CD quality for most listeners.

Minimum (safe) Best practice
Bit depth 16-bit 24-bit
Sample rate 44.1 kHz stereo 48 kHz stereo or higher
Capture format WAV or AIFF WAV
Archival format FLAC Broadcast WAV (BWF)
Access format MP3 256 kilobits/sec stereo, variable bit rate, Ogg Vorbis -q 5 stereo FLAC