Auditory localisation and binaural audio

The term localisation refers to the ability of humans to determine the location of the source of sound that they hear. Humans achieve this by using cues from the differences in sound received by each ear (binaural cues) and cues from the timbre of sound received by the ears (monoaural cues). These aspects of localization are exploited by binaural audio recordings or synthesis to produce audio imparts a greater sense of realism and immersion on the listener.

Binaural cues

Binaural cues

The term localisation refers to the ability of humans to determine the location of the source of any sound that they hear. They do this by using cues from the differences in sound received by each ear and from the timbre of the sound. Binaural audio recordings or synthesis exploits human localisation in order to create a greater sense of realism and immersion for a listener.

Binaural cues are generally used to localise sound in the horizontal plane (i.e. is the sound in front of you, to your left, or to your right?). Binaural cues rely on the difference between sound received by each ear, and so require both ears. Consider an auditory event occurring a few feet away, to the right.

Sound arriving at the left ear arrives slightly later than sound arriving at the right ear, which gives the auditory system a cue that it is likely that the source of the sound is to the right. The difference in timing is called an Inter-aural Time Difference (ITD).

Sound arriving at the left ear also will be quieter than the sound arriving at the right ear, both because it has had to travel further and because of shadowing caused by the head. The auditory system has received a cue that it is likely that the source of the sound is to the right. This difference in amplitude is called an Inter-aural Level Difference (ILD).

Monoaural cues

Monoaural cues are useful in localising sound in the vertical plane (i.e. is the sound above or below?) and to determine whether the sound is close or distant. The shape of the head, ear and even torso that sound travels through on its way towards the ear is different depending on which direction sound has come from. This means that sound received by the ear will have a different frequency content depending on the direction that that sound has come from.

The auditory system makes use of these differences and so uses that as a cue to localise sound. This is especially useful in the vertical plane, where ITD and ILD are less useful for localisation.

Monoaural cues are also used to determine the perception of how distant a sound is – if a familiar sound is quieter it helps the auditory system estimate that the sound is located far away, or if it is loud, close. High-frequency sound is attenuated more than low frequency sound, which can also assist with estimating distance.

The shape of the head, ear and torso are different for each person, which means that everyone’s auditory system uses these cues in a slightly different way.

The cone of confusion

No, that’s not an episode of The Twilight Zone.

The cone of confusion refers to a roughly conical area to either side of the head in which sound localisation can be more ambiguous as the auditory system can only rely on monoaural cues or dynamic changes. This is because the ITD and ILD between the ears is very similar wherever a sound source is in the cone, and so the sound source could lie anywhere within the cone.

Binaural decolouration

We now know that the shape of the head, ear and torso affect the frequency content of sound received by the ears, so why don’t we perceive a change in the timbre of sound when the position of a source is moved?

The answer is due to a process called binaural decolouration. The auditory system is calibrated to our own specific head, ear and torso shape, and so when a source is localised, it can filter the perceived sound to compensate for any colouration caused by that shape.

This process is adaptive – for example if there are any changes in head, ear and torso shape, localisation and binaural decolouration will initially be compromised, but over time will be restored.

Binaural decolouration can be defeated when moving the head very quickly. Try listening to a steady sound source and quickly rotating your head – it looks strange (wait until no-one else is around…), but the results are interesting. It can also be defeated if the sound source has a wide spatial extent – move your head quickly next time you’re sitting on the beach listening to waves crashing on the shore.

Binaural audio

When listening to traditional stereo audio through headphones, generally the sound is perceived to be located inside the head. Think about that for a moment – the only time in life when you localise sound to be inside your head is when you are wearing headphones – otherwise it is a completely artificial phenomenon which does not occur elsewhere. Binaural audio exploits binaural and monoaural cues to produce audio which is localised exterior to the head, as if the listener is really in the place where the sound was recorded.

Binaural audio can be produced relatively simply, by recording using small microphones that are placed inside a person’s ears, or by placing microphones inside a specially made mannequin dummy’s ears and using them to make a recording. The ITD, ILD and timbral information are all captured by such recordings and a listener will localise sound in exactly the same way as was captured by the original recording.

Another way of producing binaural audio first requires a Head Related Transfer Function (HRTF) for each ear, which allows the synthesis of binaural audio. HRTFs characterises how the ears receive sound from all different directions – which can be measured for a specific person or synthesised from measurements of many people. Audio can be filtered using HRTFs for a certain position, and so a listener will hear the sound as if it were coming from that position.

Even more sophisticated setups involve dynamic HRTF processing with the assistance of a head tracker. A HRTF-filtered audio source is played and then every time the head is rotated, a different HRTF is applied to the audio, so that the audio source will appear to be in the same location even when the head has been rotated. Such technology is incorporated into virtual reality to provide convincing and immersive experiences.

The unfortunate limitation in binaural audio is that any difference between the HRTF used and the HRTF of the listener can produce artefacts and reduce the effectiveness of the localisation. Small differences between ear and head shape can have drastic effects on how convincing the synthesised auditory localisation is. When HRTFs are measured or synthesised, effort is made to produce a HRTF that will be similar to the vast majority of people. For the best results, it is preferable to measure the individual’s HRTF.

And beyond…

Advances in binaural audio technology mean that the way that the way that we experience audio is changing and when integrated into virtual reality, could represent a real shift in how we live our lives.

Xi Engineering Consultants are at the forefront of acoustic and audio technology, having worked on acoustic optimisation for headphone manufacturers analysing the effect of underwater noise on marine life and making extensive acoustic measurement campaigns. If you have a project in the acoustic or audio domain, don’t hesitate to contact us to find out how we can become your partner in innovation.