Back to Creating Digital Content
This guide was last revised 22 November 2010
Audio recording, like photography, is a whole professional field in itself. Understanding some of the basics however can enable you to avoid many of the common mistakes and pitfalls involved in digital audio recording and conversion.
In digital audio recording there are three basic elements involved in the creation of a file: the type of sound signal (audio stream), the type of file format that the sound samples are stored in (container), and the way they are encoded into that format (codec). A codec is a piece of software (and in some cases hardware) used to encode sound signals into a format and enable decoding back to a sound signal you can hear. Compatibility issues with sound files between different software applications can be due to differences in the type of sound signal, the file format or the codec.
As with digital image capture, audio files can be recorded uncompressed, or by using lossless compression or lossy reduction. Three widely-used format standards that reflect each of these categories are WAV (uncompressed), FLAC (lossless compression) and MP3 (lossy reduction). Compact Discs use their own specialised format standard, referred to as Red Book audio.
While analogue sound waves are continuous, digital sound consists of a series of discrete samples. The quality is set by the combination of the bit depth (being the number of bits of information able to be recorded) and the sample rate (being the number of times per second the information is able to be measured). The standard for Compact Discs is 16 bit audio at a sample rate of 44,100 samples per second (44.1 kHz) in stereo, which when multiplied gives a bit rate of 1,411 kilobits per second. As a result of lossy reduction, where bits are thrown away in the process, a standard MP3 file at 128 kilobits per second contains only about one tenth of this information. High quality uncompressed or lossless audio is the only way to retain close to the level of information available in an analogue wave form, which is why these formats are used for master copies and audio archiving and preservation. A common archiving standard today is 24 bit audio at 48 kHz, creating a bit rate of 2,304 kilobits per second. Professional archivists may use a higher standard again - beyond that audible by the human ear- when recording or digitising from an analogue source, in order to be confident they have captured as much of the original wave information as possible. The trade off is that this creates very large file sizes to manage.
An analogue sound wave digitally sampled in 4 bits.
The uncompressed WAV format was designed as a proprietary standard by Microsoft and IBM for use on PCs. However, the format is openly documented, widely used and has no patent restrictions. The WAV format is highly suitable for recording and editing, and is able to be created and used in almost all recording and editing software. The main limitations of WAV are the inability to embed descriptive metadata into the file, and the maximum file size of 4 Gigabytes. Professional sound engineers and archivists use extended formats of WAV, known as the Broadcast Wave Format and RF64, to overcome these limitations. An alternative is to use a software application to manage the metadata when editing or playing back. You can also store the descriptive metadata in plain text files with the sound file when archiving. File size can be managed by using software that splits a long recording into consecutive files, or by limiting the bit rate of the recording.
The compressed FLAC format is an open format and codec licensed by the Xiph.org Foundation under the open source BSD and GPL software licences. FLAC is lossless, meaning when sound samples are encoded and then decoded, no information is lost. FLAC is also able to embed metadata using its own system which matches the open Vorbis standard (external link). FLAC files are most useful for storage and file management as, in addition to allowing embedded metadata, the compressed files are usually about half the size of a WAV file. The limitations of FLAC are largely to do with the number of software applications that currently support the format. This may also limit the usefulness of the metadata FLAC supports.
Despite its widespread use and support in media players, MP3 is a proprietary file format subject to patent claims and restrictions that may remain until 2017. Applying lossy reduction, the MP3 format is most useful for providing temporary or readily accessible copies of existing digital audio. Lossy reduction uses algorithms to calculate which bits of information can be discarded with the least impact on the sound. The MP3 algorithm is not as efficient as newer lossy formats, but in most cases the differences are minor. The important differences are in the reduced sound quality compared to uncompressed or lossless audio sources, and the difficulty editing or remixing MP3 sources. The most popular open lossy alternative to MP3s is Vorbis, supported by the same foundation that supports FLAC. An advantage of Vorbis is the format is playable on open source platforms such as Linux.
The first consideration when planning a digital audio project is the source material. Generally you will be either copying from an analogue recording or a digital recording or source; or recording sound from a real-world event. Each of these sources requires a different recording and editing strategy.
If you are copying from an analogue source, such as a cassette tape, you need to convert the sound to digital using a device called a DAC (digital-to-analogue converter). They can be computer sound cards, built into microphones and amplifiers, or can be separate boxes placed in between your source and your recording device.
As the source is analogue, it is desirable to record the best copy possible, which means using the uncompressed WAV format. If possible, record at 24 bit 48 kHz stereo quality. You can always convert a copy of the master file you create to a more accessible format afterwards. More detail on copying from cassette is available in our response about transferring oral histories from cassette.
When you copy from a digital recording, the software settings you use will depend on the original digital format. If you are copying from a Compact Disc, there is no value in recording at a quality level higher than the source (which is 16 bit 44.1 kHz stereo), as unlike analogue, no additional information can be gained. If you have the option, copy the disc at the slowest speed, enable error correction and avoid doing anything else on your computer that may bump your drive or cause your processor activity to surge. This decreases the chance of accidentally introducing errors into your new copy.
For computer based files, bit-for-bit copying is preferable in most cases. If you need to shift formats, the quality of the result will depend on whether you are copying from an uncompressed or lossless format, or from a lossy reduced format. WAV and FLAC files (along with AIFF, Apple lossless, and Windows Media Audio lossless) are easily copied and converted into virtually any other format using freely available software.
In contrast, lossy reduction results in an irreversible loss of information from the sound sample, and while the lossy algorithms can do a fairly good job the first time, shifting formats from one lossy format to another can make audio unlistenable. If you absolutely have to convert a lossy reduced file such as an MP3, convert it to a lossless or uncompressed format to prevent further information loss. Converting that format to another lossy format or converting directly from one lossy format to another will result in a very significant degrading of the sound. To ensure you are not degrading your audio beyond an acceptable level of audio loss, always quality check the output using the best pair of headphones or speakers you can find. In addition, keep a copy of the file that you are converting so that if necessary you can re-do the conversion at a later point.
There are a wide variety of professional and semi-professional options available using sound boards, studio quality microphones, and sophisticated software. For those with more limited funds and skills, there will always be a trade-off. Fortunately, the rise of podcasting has led to a dramatic improvement in the range of techniques and equipment easily accessible for digital recording.
As with pre-recorded analogue material, it is desirable to record an event to the highest quality possible. The two things that will have the greatest effect on the quality of the recording will be the format used to record the event and the microphone. At a minimum you need a recorder that records in CD-quality WAV, AIFF or lossless FLAC formats and if possible an external microphone with a stand. There are a number of specialised portable recorders used by radio journalists, interviewers and researchers which are ideally suited to basic quality recording of live events. If you can use an external microphone, use the best quality you can afford. Condenser microphones are best, but even a decent dynamic microphone will be a great improvement over many built-in microphones. If these recorders are out of your reach financially or cannot be borrowed, there are two other options currently that you might consider:
A USB microphone and netbook can be a cost-effective way of recording events
Recorders like dictaphones are not designed to produce a lasting recording and record at a very low bitrate using lossy formats such as MP3 and WMA. The results from using a dictaphone will be disappointing, and in some cases impossible to save to a long term accessible digital format.
If you are recording using a microphone, always switch off any mobile phones, as their radio antenna can interfere with the recording.
Achieving CD-quality or higher as an audio standard can be achieved with relative ease and fairly basic equipment and software. There is no good reason to record or capture audio at a lower standard. While there is no ideal encoding standard for lossy formats, in the table below we have provided two minimum lossy standards for access purposes, both being close to CD-quality for most listeners.
|44.1 kHz stereo||
48 kHz stereo
|WAV or AIFF||WAV|
|FLAC||Broadcast WAV (BWF)|
MP3 256 kilobits/sec stereo,
Ogg Vorbis -q 5 stereo
Back to Creating Digital Content