How do I make my audio 3D?

3D audio, also known as spatial audio, is a technique that takes advantage of how humans locate sounds in three-dimensional space to create an immersive listening experience. It aims to make audio seem like it is coming from all directions, as it would in real life, rather than just stereo left and right channels (https://www.highfidelity.com/blog/what-is-3d-audio).

3D audio works by using head-related transfer functions (HRTFs), algorithms based on how sound reaches our ears from various points in space, to process audio signals. By applying subtle delays and filters, 3D audio can make sounds seem like they are coming from in front, behind, above, below, or anywhere around the listener (https://catalyst-magazine.org/articles/the-science-behind-3d-sound/). This creates a lifelike soundscape and spatial impression.

Applications of 3D audio include virtual reality, augmented reality, gaming, music, movies, and other entertainment. It provides greater immersion than standard stereo audio and gives creators more control over the audio environment. Benefits include enhanced realism, improved directionality, and deeper emotional impact.

How 3D Audio is Created

There are a few key techniques used to create 3D audio effects:

Binaural recording uses two microphones to capture sound from two different positions, mimicking how human ears hear sound. This allows for incredibly realistic 3D audio when listened to with headphones. The subtle differences between the two audio channels tricks the brain into perceiving direction and space.

HRTFs (head-related transfer functions) are filters applied to sounds to simulate how they reach the eardrum based on the size and shape of a person’s head, ears, and torso. This helps create a sense of space and direction when listening on normal speakers. HRTFs are unique to each individual but averaged HRTFs work decently for most people.

Object-based audio involves mixing sounds as independent “objects” that have position data encoded into them. This allows for adaptive 3D audio rendering based on the speaker setup, giving control over where sounds are positioned. Object-based audio provides immersive 3D sound with height/elevation.

Ambisonics is a full-sphere surround sound technique that captures and encodes sound fields. This allows 3D audio to be adapted for different speaker layouts. Ambisonics is based on spherical harmonics and decomposing sound waves mathematically.

By recording binaurally, using HRTF processing, mixing object-based audio, and employing ambisonics, 3D audio can be crafted that immerses listeners in a spatial soundscape.

Sources:

https://en.wikipedia.org/wiki/3D_audio_effect

The Science Behind 3D Sound

3D Audio Formats

There are several major 3D audio formats used for creating immersive sound experiences:

Dolby Atmos

Dolby Atmos is an object-based 3D audio technology developed by Dolby Laboratories. It allows sound designers to mix audio objects in a 3D space, giving them independent movement and placement around the listener (VR Tonung). Atmos is used in movie theaters equipped with Atmos-enabled speaker systems, home theater systems using in-ceiling or up-firing speakers, and in headphones using binaural rendering. Major streaming services like Netflix and Disney+ are also releasing Atmos mixes of their original content.

DTS:X

DTS:X is DTS’s object-based spatial audio format similar to Dolby Atmos. It can encode audio objects with flexible positioning and movement in a 3D space. DTS:X can be delivered over traditional speaker layouts or via binaural rendering to headphones. Gaming platforms like Xbox Series X support DTS:X for more immersive gameplay audio (Digital Trends).

Auro 3D

Developed by Auro Technologies, Auro 3D is a “channel-based” 3D audio format, using discrete channels similar to surround sound rather than objects. It supports standard 5.1/7.1 layouts and “height” channels above the listener for overhead sound effects. Auro licenses their format to studios and hardware partners but has less adoption than Dolby Atmos or DTS:X.

Sony 360 Reality Audio

Sony 360 Reality Audio is an object-based 3D music format optimized for streaming services. Using a sound field creation engine, it can place audio objects around the listener and render them to various playback systems from headphones to home theater. Sony is partnering with music services like Amazon Music HD, Tidal, and Deezer to provide 360 Reality Audio content.(VR Tonung)

3D Audio Hardware

To create an immersive 3D audio experience, special hardware is required during both the production and listening stages. On the production side, 3D microphones like 3Di’s that capture audio from all directions are needed to record the spatial characteristics of the environment. These may be omnidirectional mics capturing a full 360-degree soundfield, or ambisonic mics composed of several capsules pointing in different directions.

During mixing, a 3D-capable mixing console is needed to manipulate all the audio elements in a 3D space. Companies like Waves make plugins and hardware for mixing in 3D. On the listening side, speaker setups specifically designed for surround sound can accurately reproduce 3D audio. These range from surround sound systems with speakers placed around the room, to binaural headphone setups that simulate 3D space using just two earbuds.

Newer formats like Dolby Atmos and DTS:X also make use of height channels, so specialized upward-firing speakers are needed. The precise speaker configurations differ between the various 3D audio formats. For personal listening, spatial audio headphones from companies like Apple and Sony simulate 3D space by processing audio based on head-tracking.

3D Audio Production Software

3D audio production requires specialized software and plugins that allow sound designers and engineers to mix and master audio in a 3D space, rather than just standard stereo. There are both paid and free options available.

One powerful paid tool is DearVR Music by Dear Reality. This is an AAX/VST3 plugin that can spatialize audio and turn your entire DAW session into an immersive 3D mix. It supports many 3D formats like ambisonics and allows for mixing with object-based audio and positional tracking.

On the free side, Envelop for Live by Baltic is an open-source set of Max for Live devices for Ableton Live Suite. It includes tools for 3D panning, ambisonics mixing, and binaural rendering. This allows producers to experiment with 3D music production at no cost.

Creating Binaural Audio

Binaural audio recording aims to capture a 3D stereo sound that recreates what the human ears hear. To record binaural audio, specialized microphones are used to mimic how the human head affects sound. These binaural microphones typically use an in-ear design with two omni-directional microphone capsules spaced apart to approximate the distance between the ears. This setup allows the microphones to capture subtle differences in timing, intensity and spectrum of the sound hitting each ear, known as interaural time differences (ITDs), interaural level differences (ILDs) and interaural spectral differences.

Another technique for producing binaural audio is using head-related transfer functions (HRTFs). HRTFs model how sound reaching the listener is filtered by the head, pinnae, ear canals and torso. By applying an appropriate pair of HRTF filters to a mono sound, a binaural effect can be created. However, generic HRTFs derived from dummy heads may not sound natural for every individual. Some 3D audio production tools allow customization of HRTFs based on photos of the listener’s ears to improve realism.

3D audio effects like occlusion, obstruction, and reverberation can further enhance binaural recordings. These effects simulate how sound is influenced by objects and the environment during propagation before reaching the listener’s ears. When combined with binaural techniques, they help recreate a fully immersive 3D auditory experience.

Object-Based 3D Audio

Object-based 3D audio is a technique that allows sounds to be positioned and moved in three-dimensional space. Instead of mixing down all sounds to a stereo or surround output, object-based audio keeps each sound as an independent “object” with associated metadata like position, size and trajectory (Sound Particles, 2022).

These sound objects contain positional data like x, y and z coordinates that place them at specific points in the 3D space. The objects can then be mixed in real-time based on their location metadata, allowing them to be moved and panned seamlessly during playback. This gives creators much more control and flexibility compared to channel-based formats (LinkedIn, 2023).

A key advantage of object-based 3D audio is that it doesn’t rely on a fixed speaker setup. The sound objects can be rendered to any system from headphones to immersive formats like Dolby Atmos. This makes object-based audio very versatile for distributing 3D sound content to end users. Overall, treating sounds as objects with positional data is a powerful approach for working with 3D audio creatively.

Ambisonics for 3D Audio

Ambisonics is a full-sphere surround sound technique that can capture and reproduce 3D audio scenes. It uses a special microphone array called a Soundfield microphone to capture audio in all directions. The audio is then encoded into an Ambisonic format like B-format that contains directional information. When played back through a decoder, the original 3D soundfield is recreated.

Ambisonics differs from stereo or surround sound in that it is based on spherical harmonics rather than channels. This means that Ambisonic audio can be rotated, tilted, zoomed and manipulated without distorting the image. First-order Ambisonics uses 4 channels while higher-orders like second-order use 9 channels or more, allowing for greater precision in localization.

To create 3D audio using Ambisonics, sound sources need to be panned to locations in 3D space during mixing. Popular digital audio workstations like Pro Tools, Reaper and Nuendo have Ambisonic plugins that can decode, rotate and mix Ambisonic audio. The mixed Ambisonic audio then has to be decoded again for playback through headphones or a loudspeaker array to render the 3D audio.

Compared to binaural or object-based methods, Ambisonics provides a good balance of workflow flexibility and3D accuracy. It doesn’t require personalized HRTFs and allows manipulation after recording. The additional channels also capture more spatial detail than stereo. Ambisonics is popular for VR, 360 video and spatial music productions.Ambi Head HD is a plugin for converting Ambisonic audio to binaural 3D audio.

Delivering 3D Audio

There are a few main methods for delivering 3D audio to listeners. The delivery method will depend on the playback device and platform.

For personal listening, 3D audio can be delivered through headphones. Specialized binaural audio mixes are designed specifically for headphone playback and create immersive 3D sound.

For home theater systems, certain receivers and speakers support 3D audio formats like Auro-3D. Auro-3D has specific setup requirements for speaker placement to properly reproduce the 3D space. Streaming platforms and Blu-ray discs may have Auro-3D audio tracks available.

Game consoles and VR headsets have built-in support for spatial 3D audio formats. Games can be mixed with surround sound or object-based audio to create 3D soundscapes when using headphones or compatible speaker setups.

On mobile devices and computers, 3D audio support depends on the hardware, OS, and apps. Spatial audio formats like Dolby Atmos can be delivered through Apple devices and Windows 10. Streaming services like Apple Music and Tidal offer some 3D audio content.

Overall, the availability of 3D audio is increasing across platforms as the technology matures. However, compatibility challenges remain in delivering consistent 3D audio experiences across devices. The audio format, mixing approach, and playback equipment all factor into successfully presenting 3D sound.

The Future of 3D Audio

The 3D audio market is expected to grow substantially in the coming years. According to Future Market Insights, 3D audio revenue is projected to more than double from 2023 to 2033, reaching nearly $14.75 billion by 2033. This growth will be driven by new technologies and applications in various industries.

In the consumer electronics market, companies like Samsung are developing new 3D audio technologies to provide deeper immersion in home entertainment. As described in this Samsung interview, next-generation 3D audio aims to replicate cinema-quality surround sound in the home. Object-based 3D audio formats like Dolby Atmos are enabling more customized and spatial audio experiences.

3D audio is also creating new opportunities in virtual and augmented reality. More immersive audio can lead to greater feelings of presence in VR and help users better parse auditory information in AR. 3D audio will likely play a major role as these technologies are adopted in gaming, live events, training simulations, and more.

Additionally, 3D audio holds promise for improving accessibility features like audio navigation and captions for blind or hard of hearing users. Directional audio cues can help users better understand their surroundings and engage with digital interfaces and content.

As 3D audio technologies progress, costs are likely to decrease, leading to wider adoption across consumer, professional, and industrial applications. More user-friendly production tools could also lower the barriers to creating 3D audio content. Overall, the future looks bright for 3D audio adding a new level of immersion and accessibility to many different experiences.