We now know that the shape of the head, ear and torso affect the frequency content of sound received by the ears, so why don’t we perceive a change in the timbre of sound when the position of a source is moved?
The answer is due to a process called binaural decolouration. The auditory system is calibrated to our own specific head, ear and torso shape, and so when a source is localised, it can filter the perceived sound to compensate for any colouration caused by that shape.
This process is adaptive – for example if there are any changes in head, ear and torso shape, localisation and binaural decolouration will initially be compromised, but over time will be restored.
Binaural decolouration can be defeated when moving the head very quickly. Try listening to a steady sound source and quickly rotating your head – it looks strange (wait until no-one else is around…), but the results are interesting. It can also be defeated if the sound source has a wide spatial extent – move your head quickly next time you’re sitting on the beach listening to waves crashing on the shore.
When listening to traditional stereo audio through headphones, generally the sound is perceived to be located inside the head. Think about that for a moment – the only time in life when you localise sound to be inside your head is when you are wearing headphones – otherwise it is a completely artificial phenomenon which does not occur elsewhere. Binaural audio exploits binaural and monoaural cues to produce audio which is localised exterior to the head, as if the listener is really in the place where the sound was recorded.
Binaural audio can be produced relatively simply, by recording using small microphones that are placed inside a person’s ears, or by placing microphones inside a specially made mannequin dummy’s ears and using them to make a recording. The ITD, ILD and timbral information are all captured by such recordings and a listener will localise sound in exactly the same way as was captured by the original recording.
Another way of producing binaural audio first requires a Head Related Transfer Function (HRTF) for each ear, which allows the synthesis of binaural audio. HRTFs characterises how the ears receive sound from all different directions – which can be measured for a specific person or synthesised from measurements of many people. Audio can be filtered using HRTFs for a certain position, and so a listener will hear the sound as if it were coming from that position.
Even more sophisticated setups involve dynamic HRTF processing with the assistance of a head tracker. A HRTF-filtered audio source is played and then every time the head is rotated, a different HRTF is applied to the audio, so that the audio source will appear to be in the same location even when the head has been rotated. Such technology is incorporated into virtual reality to provide convincing and immersive experiences.
The unfortunate limitation in binaural audio is that any difference between the HRTF used and the HRTF of the listener can produce artefacts and reduce the effectiveness of the localisation. Small differences between ear and head shape can have drastic effects on how convincing the synthesised auditory localisation is. When HRTFs are measured or synthesised, effort is made to produce a HRTF that will be similar to the vast majority of people. For the best results, it is preferable to measure the individual’s HRTF.
Advances in binaural audio technology mean that the way that the way that we experience audio is changing and when integrated into virtual reality, could represent a real shift in how we live our lives.
Xi Engineering Consultants are at the forefront of acoustic and audio technology, having worked on acoustic optimisation for headphone manufacturers analysing the effect of underwater noise on marine life and making extensive acoustic measurement campaigns. If you have a project in the acoustic or audio domain, don’t hesitate to contact us to find out how we can become your partner in innovation.