Designing believable 3D sound with binaural audio
Whether you are building next generation headphones, AR or VR experiences, in car audio or safety critical systems, one thing matters more than most people realise – how convincingly your product tells the brain where sounds are coming from.
At Xi Engineering Consultants, we work with audio and technology companies to understand how people localise sound in three dimensions, then use that insight to design, simulate and test systems that feel natural, immersive and reliable.

Why spatial hearing matters for modern audio
Auditory localisation is the brain’s ability to work out where a sound source is in space – not just left or right, but also up or down, in front or behind, and near or far. It does this by combining tiny timing and level differences between the ears with subtle changes in timbre caused by the head, ears and body.
For product and experience teams, that has direct implications:
- In VR and AR, spatial audio is a major part of whether an environment feels convincing or not.
- In headphones and hearables, believable out of head sound makes content more natural and less tiring to listen to.
- In automotive and safety applications, clear spatial cues help users react faster and with more confidence.
If your system fights against how people naturally localise sound, the result can feel flat, confusing or fatiguing, even if the raw audio quality is high.
Binaural cues – how we hear left, right and straight ahead
Binaural cues rely on the fact that we have two ears, separated by the head. They are most important for localising sound in the horizontal plane – whether something is to your left, right or in front.
Two main effects are at work:
- Interaural Time Difference (ITD)
A sound arriving from the right reaches the right ear slightly earlier than the left ear. The brain can detect time differences of only a few microseconds and uses them as a strong cue to direction. - Interaural Level Difference (ILD)
The head acts as a partial barrier to sound, especially at higher frequencies. A sound on the right will be slightly quieter at the left ear because it has travelled further and been shadowed by the head. That level difference also points to the sound being on the right.
Headphone and spatial audio systems that respect and reproduce realistic ITDs and ILDs can place sounds reliably around the listener. Systems that ignore them often leave everything sounding as if it is inside the head.
Monoaural cues – height, distance and timbre
Not all localisation depends on having two ears. Monoaural cues work with a single ear and are especially useful for:
- Vertical localisation – whether a sound is above, below or at ear height
- Front and back discrimination
- Perceived distance
As sound approaches the ear, it is filtered by the complex shape of the torso, head and outer ear. Different arrival angles produce different patterns of amplification and attenuation across frequency. The auditory system learns these patterns over a lifetime and uses them as fingerprints for direction and distance.
For distance, the brain also takes into account:
- Overall level of a familiar sound
- The balance between direct sound and reflections
- Stronger attenuation of high frequencies with increasing distance
This is why a personalised approach to spatial audio can make such a difference – your monoaural cues are shaped by your anatomy.
The cone of confusion – when cues disagree
There are regions in space where binaural cues become ambiguous. One of the most important is known as the cone of confusion.
For a given position to the side of the head, there is a roughly conical surface of points that produce almost identical ITDs and ILDs. A sound source located anywhere on that cone can give the same binaural cues, so the brain has to rely more heavily on monoaural information and on changes in cues when you move your head.
For audio product design, this is a reminder that:
- Static cues alone may not be enough for robust localisation in every direction.
- Allowing or tracking natural head movement can resolve ambiguity and improve realism.
- Testing systems with realistic head motion and source positions is essential.
Binaural decolouration – why you do not hear constant timbre changes
Given how strongly your head, ears and torso colour incoming sound, you might expect every change in source position to produce a noticeable change in timbre. In practice, this is not what we perceive.
The auditory system performs a form of binaural decolouration. It is effectively calibrated to your anatomy and learns to treat the colouration from your head and ears as normal. When it has localised a source, it can internally compensate for some of that frequency shaping, helping you hear the underlying sound consistently.
This process is:
- Adaptive – if your anatomy changes, localisation and decolouration may initially suffer, then improve over time.
- Sensitive to speed – rapid head movements or very extended sound sources, like breaking waves, can reveal the underlying colouration in interesting ways.
For engineers, this underlines why generic one size fits all processing can sometimes feel slightly off, and why user adaptation over time can matter.
From theory to practice – how binaural audio is created
Binaural audio techniques aim to reproduce at the ears the same cues that would be present if you were listening in a real space.
There are two main approaches.
1. Recording with realistic ears
Binaural recordings can be made by placing small microphones:
- In a real person’s ear canals
- In the ears of a specially built dummy head
These recordings automatically capture ITDs, ILDs and the detailed timbral changes introduced by the head and torso for that specific geometry. When played back over headphones, the listener often perceives sounds as being located outside the head, in the space where they were recorded.
2. Synthesis with Head Related Transfer Functions
For more flexible applications, audio is often synthesised using Head Related Transfer Functions (HRTFs).
- An HRTF describes how sound from a specific direction is filtered by the head and body before reaching each ear.
- By filtering a mono signal with the appropriate left and right ear HRTFs, you can create a binaural version that appears to come from that direction.
- HRTFs can be measured for an individual or taken from databases built from many people.
With head tracking, the system can update the HRTF in real time as the listener moves, keeping virtual sound sources fixed in space and greatly enhancing immersion in VR or AR environments.
The engineering challenge – individual fit and real world devices
The main limitation of binaural synthesis is that no two listeners have exactly the same anatomy.
If there is a mismatch between the HRTFs used and the listener’s own HRTFs, localisation can become less precise and artefacts may appear. Small differences in ear shape or head size can make surprisingly large differences in perceived realism.
For product teams, that creates several challenges:
- Choosing or designing HRTFs that work reasonably well for most listeners
- Understanding which aspects of HRTF mismatch are most tolerated and which are most critical
- Accounting for headphone and earbud fit, which can modify the effective HRTF again
- Balancing computational cost, latency and perceptual quality in real time systems
This is exactly where targeted measurement, simulation and listening tests can de risk design decisions.
How Xi supports spatial audio and acoustic innovation
Xi Engineering Consultants works across the acoustic and audio chain, from fundamentals of human perception to the performance of real world devices.
For teams developing binaural and spatial audio systems, we can help you to:
- Measure and model HRTFs and acoustic behaviour
Design and run measurement campaigns, build models of head, torso and device acoustics, and explore how design choices affect localisation cues. - Evaluate headphone and hearable performance
Assess how different designs interact with the ear, including leakage, fit variation and frequency response, and link these to perceived spatial performance. - Prototype and test spatial audio rendering strategies
Use simulation and controlled listening tests to compare different HRTFs, rendering algorithms and head tracking approaches before committing to implementation. - Translate findings into actionable design guidance
Turn complex acoustic and psychoacoustic data into clear recommendations your product, UX and engineering teams can use.
We can engage on focused studies around a specific problem or as an ongoing partner across generations of a product line.
