How do I separate two voices in audio?

Voice separation refers to the process of isolating individual voices or sounds from a combined audio recording. This allows each vocal part to be edited and mixed independently, which can greatly improve the overall sound quality. There are several reasons why separating voices in audio can be useful:

  • Clarity – When voices overlap or compete in a busy mix, it can make the words hard to understand. Separating them into distinct tracks allows each voice to be heard clearly without interference.
  • Editing – With the voices isolated, you can edit each one independently to adjust volume levels, add effects, or fix mistakes.
  • Mixing – Giving each vocal its own track provides greater flexibility in the final mix. The balance between voices can be precisely controlled.
  • Remixing – Vocals can be easily removed or rearranged in a new mix. This enables creative remixing and mashups.
  • Repurposing – Separated vocal tracks can be reused in other projects. For example, isolating a vocal melody to use in a new song.

In summary, voice separation gives audio engineers and editors more options when working with multi-vocal recordings. It enables clearer, more flexible results compared to working with a crowded composite track.

Understanding Vocal Frequencies

Every human voice has a distinct frequency range that allows us to distinguish between different people’s voices. According to research, the typical frequency range for male voices is between 100Hz to 8KHz, with the fundamental frequency being 100Hz to 900Hz and harmonics from 900Hz to 8KHz. Female voices generally have a higher frequency range, owing to their shorter vocal tracts that vibrate faster, producing frequencies that can range from 165Hz to 255Hz on average (https://www.dpamicrophones.com/mic-university/facts-about-speech-intelligibility).

There can be variation even within those ranges based on factors like age, vocal effort, and pitch. However, each person’s unique physical vocal characteristics shape their identifiable tonal quality and frequency profile. Understanding these distinctions is key to separating voices by targeting the different frequency ranges in audio editing.

Using EQ to Reduce Clashing Frequencies

One of the most effective ways to separate two voices in a mix is by using EQ to reduce frequencies that are clashing between the vocals. The human voice generally resides in the 300Hz to 3kHz frequency range, with fundamental tones around 100Hz for a bass voice and up to 300Hz for a soprano.

To separate two vocals, you can apply surgical EQ cuts in the midrange frequencies where the voices are competing. For example, if you have a male and female vocal that are clashing around 500Hz, you could apply a tight notch EQ cut between 400-600Hz on the female vocal to reduce some of those low midrange frequencies. Or on the male vocal you could reduce some of the upper midrange frequencies around 2kHz that might clash with the female vocal’s range.

The key is to identify the specific problematic frequencies through editing and EQ sweeps and then apply very narrow and strategic cuts, ultimately reducing the frequencies where each vocal resides in the mix. This can effectively “carve out” space for both vocals to breathe. You want to avoid over-EQing, as too many cuts will undermine tone. Subtle and targeted EQ moves are best for separating voices (Source).

Noise Reduction

Using noise reduction tools is an effective way to isolate voices by reducing background noise that clashes with vocal frequencies. Common techniques include:

Applying noise reduction effects like those in Audacity, Adobe Audition, or other audio editors. These work by analyzing the audio to identify steady noise, then reducing those frequencies while preserving the rest of the audio spectrum (Apple Support, 2023). This helps reduce ambient background sounds like fans, traffic, or other steady noises.

Tools like Izotope RX’s Dialogue Isolate utilize machine learning algorithms to separate speech from noise more intelligently (Descript, 2023). They can isolate and remove complex variable background noise while retaining the clarity of voices.

Noise reduction won’t completely isolate voices, but is an important first step. When used alongside other techniques like EQ and panning, it can significantly reduce background noise that interferes with vocal clarity in a mix.

Panning

Panning vocals and instruments to opposite sides of the stereo field is one of the most effective ways to create separation between multiple voices in a mix. This technique takes advantage of the natural spatial separation we perceive between sounds coming from the left versus the right.

For example, you can pan the main lead vocal to the center, while panning backing vocals slightly left and right. Any doubled lead vocals can also be panned opposite each other. As explained in this Gearspace thread, “If you have three extra vocals, pan them left, right, and center.” (Source)

Wider panning creates more separation, but can cause an unnatural or disjointed feel if taken too far. Subtle panning of 10-30% left or right is often sufficient. Also remember to avoid panning important center elements like lead vocals and bass fully to the sides.

Volume Automation

One technique to help separate voices in audio is to use volume automation. This involves adjusting the volume levels of individual tracks over time to highlight certain voices. As this video explains, boosting the volume during one vocal part while lowering other tracks can make that voice stand out more clearly.

Volume automation can be a time consuming process, but it gives you fine-grained control. As discussed in this article, you can create automation curves in your DAW to raise and dip volume levels precisely where needed. This selective boosting and cutting of specific voices helps differentiate them in the mix.

Some prefer to avoid manual volume automation due to the tedious work involved, as noted in this forum thread. However, when done right, volume automation can be an effective way to separate overlapping voices by highlighting the most important parts.

DeReverberation

Reverberation occurs when sound waves reflect off surfaces like walls, causing a reverb “tail” after the initial sound. This can muddle the clarity of vocals and make voices sound distant or hollow. DeReverberation aims to reduce or remove this reverb in order to clean up the audio quality.

There are a few common techniques used for dereverberation:

  • Spectral subtraction methods analyze the spectral decay of reverberant signals and attempt to subtract the reverb.
  • DNN-based methods like WPE use deep neural networks to model the reverberation and subtract the reverb tail.
  • Beamforming techniques use microphone arrays to focus on direct sound waves coming from the desired source while minimizing picked up reflections.

Advanced dereverberation plugins like Izotope RX leverage machine learning algorithms to identify reverb and selectively reduce it. They can provide very transparent results while preserving the original vocal tone.

Overall, dereverberation is an effective way to reduce hollowness and echo for cleaner, closer-sounding vocals in a mix when you can’t record in an ideal dry space.

Izotope RX Dialogue Isolate

Izotope’s RX 9 software includes a powerful new tool called Dialogue Isolate that leverages AI technology to isolate and separate speech from background noise in audio files (Source). Dialogue Isolate uses a deep neural network that has been trained on millions of samples to identify and extract the fundamental elements of human speech.

This tool analyzes the input audio to pinpoint speech components like fricatives, plosives, and nasal sounds. It can accurately isolate dialogue even in very noisy conditions, extracting just the vocal elements while suppressing other sounds. The separated dialogue audio retains the clarity and intelligibility of the original recording.

Dialogue Isolate represents a major leap forward in dialogue editing technology. It can save enormous amounts of time compared to tedious manual editing of waveforms. The tool is intuitive and easy to use – you simply select the vocal range you want to isolate and Dialogue Isolate handles the complex separation process behind the scenes.

For film, TV, and podcast editing workflows, Dialogue Isolate makes it fast and simple to clean up location audio, remove background noise, and prepare clear dialogue tracks for post-production. It’s an indispensable tool for today’s audio engineers and content creators.

Adobe Audition Speech Isolation

Adobe Audition includes a feature called Speech Isolation that can help separate dialogue and vocals from background noise and music. This tool analyzes the frequencies in the audio to isolate just the frequencies associated with human speech.

To use Speech Isolation in Adobe Audition:

  1. Open the audio file containing the dialogue and background audio in Adobe Audition.
  2. Select the section of audio you want to isolate.
  3. Go to Effects > Special > Speech Isolation. This will open the Speech Isolation panel.
  4. Adjust the sliders for Reduction and Suppression to fine-tune the isolation. Reduction removes background noise while Suppression removes background music.
  5. Preiew the isolated audio to ensure clear dialogue. You may need to adjust the sliders to get the optimal balance.
  6. When satisfied, click Apply to actually separate the speech from the other audio.

The key to Speech Isolation is that Adobe’s AI analyzes the entire frequency spectrum in the audio file to identify the particular frequencies associated with human speech. By suppressing all other frequencies, the dialogue is effectively separated while the background audio gets removed. This provides a quick way to isolate speech without needing to manually edit frequencies with EQ.

For optimal results, try to capture dialogue with limited background noise in the original recording. The Speech Isolation tool works best when there is plenty of clean dialogue for the AI to analyze. You may get lower quality results isolating dialogue from busy background audio.

Conclusion

Separating voices in audio can be challenging but is possible with the right tools and techniques. A few key points to remember:

  • Use EQ to isolate the frequency ranges where each voice lives.
  • Carefully adjust volume levels for each isolated track.
  • Pan the voices apart to separate them spatially.
  • Clean up background noise and reverb tails that cause voices to bleed together.
  • Leverage dedicated tools like Izotope RX Dialogue Isolate for best results.
  • Work methodically and listen closely when adjusting parameters.
  • Some blending is inevitable so aim for the best separation possible, not perfection.

With practice and these tips, you can successfully tease apart individual voices in multi-vocal recordings. Just take it slow and trust your ears. The end result will be much cleaner, isolated vocal tracks.

Leave a Reply

Your email address will not be published. Required fields are marked *