Audio Tracks; The Art of Advanced Dialog Processing

Roy W. Rising

For many video projects, the intended message is conveyed by words. For this reason, it’s important to keep the dialog clear and unobstructed by noise while maintaining a signal level that is not masked by sounds in the listening environment. Motion pictures routinely use dialog replacement (looping) to assure that the best audio is presented.

One of the toughest cases for dialog processing is the TV soap opera. The intent is to sound like a movie, but circumstances make it difficult. The sound stage is filled with unwanted sounds — movement of cameras and personnel, noise from ac lighting equipment, and poor acoustics. The special techniques for combating these problems form a study that can be extended and modified for other dialog problems.

No single device does every job needed for handling dialog. It has become common practice to route all dialog sources through a series of systems on their way to the final mix. Placing these units in a particular order is important. If the sequence is wrong, the results can’t be optimal.

The concept applies filters and de-essers first, so what is removed does not affect the critical dynamics devices that follow. Next in line are compressors and limiters, which reduce the dynamic range to keep quiet speech from being lost and to prevent the occasional burst from overloading. Last comes noise reduction. It is important for the sound to be as good as possible before this stage is applied.

It is important to stress the reason for placing the filter set at the head of the chain. If a sound with sufficient energy to cause a compressor or limiter to act is later removed, the program level will be changing without audible reason. This effect is very subtle. It resembles what you may have experienced when your hearing is fatigued by excessive exposure to loud sounds. The natural compressor in the ear is no longer tracking with the incoming sound. Instead, it is stuck at reduced gain and trying to relax to normal sensitivity.

Speech contains no energy below about 120Hz, so a high-pass filter helps by removing low-frequency noises. At the other end of the spectrum, the harmonics and intelligibility factors of the voice are below 8kHz. Low-pass filtering helps remove annoying rustles from wardrobe, paper packages, and footsteps. The devices that provide steep HP and LP filtering also usually contain very narrow notch filters. These are helpful for removing ac harmonics at 180Hz and 240Hz. Equipment on the set may contain motors that have characteristic whines. Refrigerators and soda machines should be turned off during takes. If they aren’t, a notch can eliminate the whine.

Another device that should be ahead of the other dynamics stages is the de-essser. Some voices contain excessive sibilance. This is heard as a whistling of “s” sounds. A good de-esser is transparent and can be left in the circuit all of the time. When sounds with strong information in the 5kHz-to-6kHz range are present, the de-esser briefly reduces gain in a narrow band, eliminating annoyances and preventing possible overload of recording and transmission gear. The de-esser also helps with rustling sounds below the low-pass filter’s cutoff.

Next in the chain come compressors and limiters. A compressor uniformly reduces dynamic range by the amount of its specified ratio. Two-to-one compression reduces a 40dB range to 20dB. Dialog benefits from compression that begins at a threshold set near the quietest speech levels, about -30dB. Two-to-one or 3:1 compression brings the whispers up to audibility and controls more energetic talking.

A limiter is a compressor with a ratio exceeding about 20:1. When its threshold is reached, higher input results in almost no increase of output. More important for protection against unexpected outbursts, the threshold may be set at or above OVU for the system. Good gain riding keeps the compressed average from activating the limiter.

Some dynamics devices provide adjustable attack and release times. Dialog usually benefits from very fast attack and somewhat slower release times. This avoids sudden overload and prevents noticeable rebound of the background between words.

The most basic form of noise removal is gating. The idea is to attenuate the signal between words so background noise is less objectionable. During speech, the dialog tends to mask the noise. Fast-acting broadband gates operate on the entire spectrum and must be used cautiously. There is a tendency to cut the signal too deeply when 3dB or 6dB may be sufficient.

Dolby A-Type noise reduction from Dolby Labs (San Francisco) began a quiet revolution in dialog processing. Intended only to be used in encode/decode pairs, someone decided to try the decode-mode only. Encoding raised the signal level in four bands. Decoding restored the original levels, pushing down the noise as well. It was found that the decode-only process helped reduce the acoustical annoyances that come with dialog and boom mic technique.

At first, Dolby did not favor the decode-only use of Dolby A-Type noise- reduction units. This led to the Cat. No. 43 controller, which provides sliders to adjust the compression (encode) or expansion (decode) in the bands. An overall sensitivity control adjusts the threshold. As with a broadband gate, just a little cut can do wonders. In practice, the best depth is in the 70Hz-to-1kHz range. Less depth is necessary in the upper bands to avoid noticeable effects. The Cat. No. 43 no longer is manufactured by Dolby and only is available on the used-equipment market. It must be used with a model 360 or 361 A-Type noise-reduction unit.

Dolby introduced the 430 Series based on the SR noise-reduction technology. Because SR is a smart processor, the need for separate controls was changed to provide only sensitivity and depth. The system analyzes the energy spectrum and determines what needs to be done. Some users feel a better sense of control with the Cat. No. 43.

Digital Signal Processing has taken noise removal to a new level. Roland (Los Angeles) brought out the SN-550 Digital Noise/Hum Eliminator a few years ago. It provided multiband gating plus a comb filter capable of removing the harmonics of ac hum that is called buzz. There was immediate recognition of its abilities, but users needed more control.

Then Roland produced the SN-700. At the risk of going too far, discrete control over every possible parameter was provided. The SN-700 delivers two channels of seven-band noise reduction plus comb filtering.

Each band can be set for depth, threshold, attack, hold, and release times. Dialog benefits from very fast attack time — 1ms. To avoid clipping the ends of sounds, a 5ms hold time gives a delay before a 5ms to 10ms release time eases the gain downward to its best depth. As with previous generations of gating devices, a depth as small as 6dB may be sufficient. To get better performance in the upper bands, thresholds may be set for greater sensitivity rather than less depth, unless both are needed.

The digital comb filter can be referenced to the line frequency or adjusted with four-digit accuracy over a range from 20Hz to 10kHz. This is useful for combing the harmonics of other kinds of noise. The upper and lower limits of the affected range are adjustable, as are suppression depth and width, attack, hold and release times, and open gain. The last is valuable when a buzz seems to be “coming and going” with dialog. This indicates the need for partial combing of the buzz not masked by speech.

Some of the SN-700’s other features include a selection of factory presets, nonvolatile storage for your favorite settings, and MIDI control. The inevitable caveat of so powerful a system is — Go easy. I’ve found that by recording raw, unprocessed dialog and patiently trying the controls it is possible to clean up some difficult problems. The controls make it possible to edit parameters temporarily when the need arises and to restore the base line settings on the fly. Audio is getting to be more like brain surgery!

