Spatial audio

From Things and Stuff Wiki
Jump to navigation Jump to search


  • - a listener's ability to identify the location or origin of a detected sound in direction and distance. It may also refer to the methods in acoustical engineering to simulate the placement of an auditory cue in a virtual 3D space (see binaural recording, wave field synthesis). The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system uses several cues for sound source localization, including time- and level-differences (or intensity-difference) between both ears, spectral information, timing analysis, correlation analysis, and pattern matching.

  • - composed music that intentionally exploits sound localization. Though present in Western music from biblical times in the form of the antiphon, as a component specific to new musical techniques the concept of spatial music (Raummusik, usually translated as "space music") was introduced as early as 1928 in Germany. The term spatialisation is connected especially with electroacoustic music to denote the projection and localization of sound sources in physical or virtual space or sound's spatial movement in space.

  • - an audio microphone composed of four closely spaced subcardioid or cardioid (unidirectional) microphone capsules arranged in a tetrahedron. It was invented by Michael Gerzon and Peter Craven, and is a part of, but not exclusive to, Ambisonics, a surround sound technology. It can function as a mono, stereo or surround sound microphone, optionally including height information.

  • SpatDIF - Spatial Sound Description Interchange Format. SpatDIF is a format that describes spatial sound information in a structured way, in order to support real-time and non-real-time applications. The format serves to describe, store and share spatial audio scenes across audio applications and concert venues.
  • - a plugin (Mac AU/VST and VST Windows format) designed to compose multichannel space. It allows the user to spatialize the sound in 2D (up to 16 speakers) or in 3D (up to 128 speakers) under a dome of speakers (with the ServerGRIS, under development). SpatGRIS is a fusion of two former plugins by the GRIS: OctoGRIS and ZirkOSC with a lot of new festures.

  • - the original name for a positional three-dimensional (3D) sound processing algorithm from QSound Labs that creates 3D audio effects from multiple monophonic sources and sums the outputs to two channels for presentation over regular stereo speakers. QSound was eventually re-dubbed "Q1" after the introduction of "Q2", a positional 3D algorithm for headphones. Later multi-speaker surround system support was added to the positional 3D process, the QSound positional 3D audio process became known simply as "Q3D". QSound was founded by Larry Ryckman (CEO), Danny Lowe and John Lees. Jimmy Iovine served as SVP of Music and Shelly Yakus as VP of Audio Engineering in its formative years.


  • - monophonic sound reproduction (often shortened to mono) is sound intended to be heard as if it were emanating from one position. This contrasts with stereophonic sound or stereo, which uses two separate audio channels to reproduce sound from two microphones on the right and left side, which is reproduced with two separate loudspeakers to give a sense of the direction of sound sources. In mono, only one loudspeaker is necessary, but, when played through multiple loudspeakers or headphones, identical signals are fed to each speaker, resulting in the perception of one-channel sound "imaging" in one sonic space between the speakers (provided that the speakers are set up in a proper symmetrical critical-listening placement). Monaural recordings, like stereo ones, typically use multiple microphones fed into multiple channels on a recording console, but each channel is "panned" to the center. In the final stage, the various center-panned signal paths are usually mixed down to two identical tracks, which, because they are identical, are perceived upon playback as representing a single unified signal at a single place in the soundstage. In some cases, multitrack sources are mixed to a one-track tape, thus becoming one signal. In the mastering stage, particularly in the days of mono records, the one- or two-track mono master tape was then transferred to a one-track lathe intended to be used in the pressing of a monophonic record. Today, however, monaural recordings are usually mastered to be played on stereo and multi-track formats, yet retain their center-panned mono soundstage characteristics.

Monaural sound has largely been replaced by stereo sound in most entertainment applications, but remains the standard for radiotelephone communications, telephone networks, and audio induction loops for use with hearing aids. FM radio stations broadcast in stereo, while most AM radio stations broadcast in mono. (Although an AM stereo broadcast standard exists, few AM stations are equipped to use it.) A few FM stations—notably talk-radio stations—choose to broadcast in monaural because of the slight advantage in signal strength and bandwidth the standard affords over a stereophonic signal of the same power.


  • - or, more commonly, stereo, is a method of sound reproduction that creates an illusion of multi-directional audible perspective. This is usually achieved by using two or more independent audio channels through a configuration of two or more loudspeakers (or stereo headphones) in such a way as to create the impression of sound heard from various directions, as in natural hearing. Thus the term "stereophonic" applies to so-called "quadraphonic" and "surround-sound" systems as well as the more common two-channel, two-speaker systems. It is often contrasted with monophonic, or "mono" sound, where audio is heard as coming from one position, often centered in the sound field (analogous to a visual field). In the 2000s, stereo sound is common in entertainment systems such as broadcast radio and TV, recorded music and the cinema.

  • - the aspect of sound recording and reproduction concerning the perceived spatial locations of the sound source(s), both laterally and in depth. An image is considered to be good if the location of the performers can be clearly located; the image is considered to be poor if the location of the performers is difficult to locate. A well-made stereo recording, properly reproduced, can provide good imaging within the front quadrant; a well-made Ambisonic recording, properly reproduced, can offer good imaging all around the listener and even including height information.

  • - microphone technique used to record stereo sound.ORTF setup.It was devised around 1960 at the Office de Radiodiffusion Télévision Française (ORTF) at Radio France.ORTF combines both the volume difference provided as sound arrives on- and off-axis at two cardioid microphones spread to a 110° angle, as well as the timing difference as sound arrives at the two microphones spaced 17 cm apart.
  • - a method of capturing stereo sound.The Nederlandse Omroep Stichting (NOS, English: Dutch Broadcast Foundation) found a stereo main microphone system by a number of practical attempts in the 1960s. This system resulted in a quite even distribution of the phantom sources (hearing event direction) on the stereo loudspeaker base, with two small cardioid characteristic microphones, and a recording angle of the microphone system of ±40.5° = 81°. This system got empirical an axle angle of α = ±45° = 90° and a microphone distance (microphone basis) of a = 30 cm.
  • - the name for a stereo recording technique invented by Alan Blumlein for the creation of recordings that, upon replaying through headphones or loudspeakers, recreate the spatial characteristics of the recorded signal.The pair consists of an array of two matched microphones that have a bi-directional (figure 8) pickup pattern. They are positioned 90° from each other. Ideally, the transducers should occupy the same physical space; since this cannot be achieved, the microphone capsules are placed as close to each other as physically possible, generally with one centered directly above the other. The array is oriented so that the line bisecting the angle between the two microphones points towards the sound source to be recorded (see diagram). The pickup patterns of the pair, combined with their positioning, delivers a high degree of stereo separation in the source signal as well as the room ambiance.
  • - a sound-absorbing disk placed between two microphones to create an acoustic "shadow" from one microphone to the other. The resulting two signals can possibly produce a pleasing stereo effect. A matching pair of small-diaphragm omnidirectional microphones is always used with a Jecklin disk.


  • - the process of blending the left and right channels of a stereo audio recording. It is generally used to reduce the extreme channel separation often featured in early stereo recordings (e.g., where instruments are panned entirely on one side or the other), or to make audio played through headphones sound more natural, as when listening to a pair of external speakers.

  • [1] Headphones have extreme stereo separation--the right ear doesn't get to hear much of what's going on on the left. This leads to the impression the music's coming from inside your head, and sounds especially weird when instruments are panned hard to one side or the other. Crossfeed filters aim to fix this by letting the channels mix a little, but in a controlled way. The goal is to mimic what happens naturally when listening to music on speakers.

  • CIP Audio Player - Java. In a normal stereo setup the spot where the stereo image really creates the best sound stage is very small. By calculating a third signal for a center speaker it is possible to enhance the stereo soundstage for more than one listener.

Leslie speaker

  • - a combined amplifier and two-way loudspeaker that projects the signal from an electric or electronic instrument, while modifying the sound by rotating the loudspeakers. It is most commonly associated with the Hammond organ, though it was later used for the guitar and other instruments. A typical Leslie speaker contains an amplifier, and a treble and bass speaker—though specific components depend upon the model. A musician controls the Leslie speaker by either an external switch or pedal that alternates between a slow and fast speed setting, known as "chorale" and "tremolo".

to sort

  • BLS1 is a digital realisation of the 'Blumlein Shuffler', invented by Alan Blumlen in the early 1930s and analysed in detail by Michael Gerzon in a paper presented at the 1993 AES Convention in San Francisco.

  • MONSTR - a multiband stereo imaging plugin, available for Windows, Mac, and Linux. It allows the user to control the stereo width of a sound in 3 different frequency bands, and so can be used to perform common tasks such as narrowing the bass frequencies while adding width to the highs, allowing fine control over the stereo image of your mix.

  • Holophon - a set of tools for the programming and real-time manipulation of sound trajectories across different speakers. Its main development is the Holo-Edit trajectory editor. It’s a graphical and algorithmic editor of sound trajectories. Holo-Edit makes it possible to draw and graphically edit trajectories across a complex sound system. It’s also possible to program those trajectories with different automatic functions. HoloEdit is a set of graphical editors and algorithmic functions for creating and manipulating sounds in space. This software allows for the precise positioning of multiple sounds in time and space (defined by a set of speakers). In order to do so, it associates sounds to trajectories (a set of points defined by their position in space (x, y, z) and their date). HoloEdit also supports the SDIF format so that it is possible to generate/transform sound trajectories from SDIF data. masOS, plus Holoboule for pd-extended.

  • - given a stereo audio stream, “expands” it to three channels. Everything that is present in both input channels will be in the center channel of the output, and what is specific to each channel will be in the corresponding output channel.

Windows / Mac

  • MStereoExpander - offers expansion based on either actual samples or on delay, and provides stereo field correction to increase or reduce the clarity of the spatial differences between channels. It is fully mono-compatible.
  • MStereoProcessor - an advanced mastering multiband stereo analyzer and enhancer plugin, which lets you easily control the stereo image and the necessary perception of depth and space.
  • A1StereoControl - expand or limit the STEREO WIDTH of your tracks using only one single knob. This powerful technique can be used on single tracks or groups tracks while mixing or even on a master bus in final mastering situations. Windows/Mac
  • Proximity - an easy to use distance “pan-pot” based on several psycho-acoustic models. The idea is to give mixing engineer a reliable tool which allows him to manipulate the “depth” of several sound source in a straight forward and convincing manner.
  • Voxengo MSED - a professional audio encoder-decoder plugin for mid-side processing which is able to encode (split) the incoming stereo signal into two components: mid-side pair, and vice versa: decode mid-side signal pair into stereo signal. MSED is also able to work in the “inline” mode with the ability to adjust mid and side channels’ gain and panning without the need of using two plugin instances in sequence. MSED can be used to flip the phase of the mid and side channels by 180 degrees, and swap the stereo channels, and to extract the mid or side channel. MSED features the “plasma” vector scope, stereo correlation and balance meters which make it easier to monitor the stereo information present in the audio signal.
  • Voxengo Stereo Touch - This professional audio plugin implements a classic technique of transforming a monophonic track into spacious stereophonic track by means of mid/side coding technique. Stereo Touch is most effective on monophonic sounds without overly sharp transients: it works great for both acoustic and electric/overdriven guitars, synthetic pad sounds and even vocals. By means of this plugin you can easily get spacious and even “surround” sounding tracks, without utilizing a double-tracked recording technique.
  • Upstereo - A FREE stereo enhancer. Stereo width slider going from mono to wide, bringing the stereo image out and towards the listener. Loudness control boost. Loudness overdrive option. Subtle Air & Bass boosters to lift and help the audio 'breath'. Movable 3D interface, with changeable colours and light, positions. Very low CPU usage.


Surround sound

  • - PanJack implements a real-time surround-sound panorama mixer. one or more audio input(s), two or more audio outputs / speakers, control via OSC or MIDI. (optional) bcf2000 fader/pan control. Note: Jack-Transport needs to be rolling in order for panjack to process audio. It creates jack-audio input and output ports and routes audio with a latency of 1 jack-cycle, applying a amplifications depending on faders and panorama-gain. The panorama-gain settings can be adjusted manually for each output channel or be modified indirectly using built-in maths for 2D (angle, separation) or X/Y-distance panning. Furthermore there is built-in functionality to automate whirl/leslie-rotate effects. panjack itself does not provide sequencer capabilities. Yet this feature can be archived easily by controlling panjack via OSC and any OSC-sequencer


  • - a series of multichannel audio technologies owned by DTS, Inc. (formerly known as Digital Theater Systems, Inc.), an American company specializing in digital surround sound formats used for both commercial/theatrical and consumer grade applications. It was known as The Digital Experience until 1995. DTS licenses its technologies to consumer electronics manufacturers.

Dolby Atmos

  • - allows up to 128 audio tracks plus associated spatial audio description metadata (most notably, location or pan automation data) to be distributed to theaters for optimal, dynamic rendering to loudspeakers based on the theater capabilities. Each audio track can be assigned to an audio channel, the traditional format for distribution, or to an audio "object." Dolby Atmos by default, has a 10-channel 7.1.2 bed for ambience stems or center dialogue, leaving 118 tracks for objects. Dolby Atmos home theaters can be built upon traditional 5.1 and 7.1 layouts. For Dolby Atmos, the nomenclature differs slightly: a 7.1.4 Dolby Atmos system is a traditional 7.1 layout with four overhead or Dolby Atmos enabled speakers.


  • - a sound source in an Ambiophonic system, made by two closely spaced loudspeakers that ideally span 10-30 degrees. Thanks to the cross-talk cancellation method, a stereo dipole can render an acoustic stereo image nearly 180° wide (single stereo dipole) or 360° (dual or double stereo dipole).


  • - a full-sphere surround sound technique: in addition to the horizontal plane, it covers sound sources above and below the listener. Unlike other multichannel surround formats, its transmission channels do not carry speaker signals. Instead, they contain a speaker-independent representation of a sound field called B-format, which is then decoded to the listener's speaker setup. This extra step allows the producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of speakers used for playback.

  • Ambisonics | SpringerLink - A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality, open access book

  • - Researchers working on very high-order systems found no straightforward way to extend the traditional formats to suit their needs. Furthermore, there was no widely accepted formulation of spherical harmonics for acoustics, so one was borrowed from chemistry, quantum mechanics, computer graphics, or other fields, each of which had subtly different conventions. This led to an unfortunate proliferation of mutually incompatible ad-hoc formats and much head-scratching.
  • Ambisonics Component Ordering - The two primary component ordering formats for ambisonics are Furse-Malham, commonly called FuMa, and Ambisonics Channel Number, commonly called ACN. As seen in the following image, the former uses a lettered notation that - following alphabetical order per grouping - starts with the W (omni) channel, moves to its lower right, then its lower left, and then its lower center; then it moves to the next order and starts at the R, moves to its right, then to its left, then the further right, then the further left; then it moves to the next order and follows a similar pattern. On the other hand, the latter is numbered in a much easier to follow left-to-right order.

  • Spatialisation - Stereo and Ambisonic - Richard W.E. Furse, "This chapter discusses an approach to generating stereo or Ambisonic sound images using Csound. The focus here is not on the acoustics of the human head but on modelling sound in an acoustic space. We will use Csound to produce a ‘virtual’ acoustic space in which to move sounds and make recordings."

  • ARCADE - a patented spatial audio codec that allows encoding scene-based 3D audio over stereo with no additional metadata required. It allows encoding sources with height in a fully spherical manner. Its decoder is able to decode to virtually any 3D or 2D audio format, for example first-order or higher-order spherical harmonics (FOA, HOA), VBAP, Surround, Binaural with or without head-tracking etc. The decoder also works as an upmixer for any stereo content, to any of the formats it can decode to.


to sort

  • - repository contains easy to use stereo decoders for high order ambisonics using the ambisonic toolkit for SuperCollider that are automatically set up as persistent main effects on the main outputs of SuperCollider. They are respawned when the user hard stops the sound.This quark depends on the ambisonic toolkit and requires a full installation of it.

  • mcfx – multichannel audio plug-in suite, VST plug-ins for MacOS, Windows and Linuxm (mcfx_convolver, mcfx_delay, mcfx_filter, mcfx_gain_delay, mcfx_meter), these plug-ins are very handy if you want to process multiple channels in one go for example; multiple loudspeaker setups, Ambisonics (see ambiX); Microphone array post productions (eg. Eigenmike®)

  • - a set of free, open source and modular software tools for sound spatialization. It is comprised of 4 different types of software: spatialization renderers: standalone applications that render spatialized audio using ambisonics or amplitude panning; spatialization interfaces: standalone interfaces that generate spatial information to control the spatialization renderers via OSC; plugins: audio unit plugin and max for live devices to control the spatialization renderers via OSC; max objects: a library of objects for spatialization using ambisonics or amplitude panning in Cycling’74 Max.

"In short, it takes so-called 'pair-wise' panning - i.e. the panning of localised sounds between two loudspeakers - and does a little more math to extend it into triplet-wise panning. The three loudspeakers are arranged in a triangle layout. Localised sounds no longer just pan horizontally, between two positions, but now pan vertically too. This change means that we have extended from 1-dimensional movement into 2-dimensional movement.

"As Ville's diagram shows, as you add more triangles you can extend into the 3rd dimension too, by creating a 'mesh' similar to the polygons that describe 3D space in computer games. The amplitude of any sound 'moving through' the space is calculated for each of the nearest three speakers. The equation takes into account distance from the loudspeaker, and so VBAP differentiates from Ambisonics and irregular loudspeaker layouts can be supported. However there still needs to be a 'mesh' based on triangles, as any individual sound can only exist between the nearest three points. The emphasis here is still on satisfying a 'sweet spot', a localised and immobile audience. In this respect, VBAP is similar to Ambisonics."

  • - documentations and implementations of the Vector Base Amplitude Panning (VBAP). The VBAP is a spatialization techniques created by Ville Pulkki in the late 90's. For further information see the references. The VBAP is available as a C library with an implementation as externals for Pure Data (and also as abstractions).

  • - A Matlab implementation of the Higher-order Spatial Impulse Response Rendering (HO-SIRR) algorithm. An alternative approach for reproducing Ambisonic IRs over loudspeakers.

Wave field synthesis


SoundScape Renderer


  • WONDER - a software suite for using Wave Field Synthesis and Binaural Synthesis. It's primary platform is Linux, but it can be used under OSX too.


  • - a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as "dummy head recording", wherein a mannequin head is outfitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers. This idea of a three dimensional or "internal" form of sound has also translated into useful advancement of technology in many things such as stethoscopes creating "in-head" acoustics and IMAX movies being able to create a three dimensional acoustic experience.

  • - a cognitive process that involves the "fusion" of different auditory information presented binaurally, or to each ear. In humans, this process is essential in understanding speech as one ear may pick up more information about the speech stimuli than the other. The process of binaural fusion is important for computing the location of sound sources in the horizontal plane (sound localization), and it is important for sound segregation. Sound segregation refers the ability to identify acoustic components from one or more sound sources. The binaural auditory system is highly dynamic and capable of rapidly adjusting tuning properties depending on the context in which sounds are heard. Each eardrum moves one-dimensionally; the auditory brain analyzes and compares movements of both eardrums to extract physical cues and synthesize auditory objects.

  • Panagement - spatialization toolbox by Auburn Sounds

Spatial Audio Framework




  • - open-source, higher-order Ambisonics, 360° video player with acoustic zoom. HOAST360 dynamically outputs a binaural audio stream from up to fourth-order Ambisonics audio content.


  • TASCAR - or Toolbox for acoustic scene creation and rendering a collection of tools used for creation of spatially dynamic acoustic scenes in various render formats, e.g., higher order Ambisonics or VBAP. The toolbox is developed for applications in the context of hearing research and hearing aid evaluation.




  • - a custom open-source C++ library developed within the EU-funded project 3D Tune-In. The Toolkit provides a high level of realism and immersiveness within binaural 3D audio simulations, while allowing for the emulation of hearing aid devices and of different typologies of hearing loss.



  • - provides a framework for building multichannel installations using WebRTC. With Spatify each client becomes a JACK client on the host machine, any audio sent to that client's ports will be streamed over WebRTC to the client. The original idea behind this project was to use multiple smartphones as individual speakers in installations or performances.


  • - aims to simplify the creation and control in real time of mass of spatialised sound objects on various kinds of loudspeaker configurations (particularly stereo, quadriphonic or octophonic setups, as well as domes of 16, 24 or 32 loudspeakers...) with many controllers (currently, you can control it with the GUI and the keyboard, and you can also add an Akai APC Mini, 2 tablets with Lemur App, 2 MIDI Fighter Twister and a Sensel Morph).


  • The AlloSphere Research Facility - The AlloSphere is a one-of-a-kind immersive instrument that is the culmination of 30 years of Professor JoAnn Kuchera-Morin’s creativity and research efforts in media systems and studio design. It is differentiated from conventional virtual reality environments by its seamless surround-view capabilities, ability to accommodate 30 or more people simultaneously in a shared virtual world with no loss of self, and its focus on multiple sensory modalities and interaction.


  • - Make a position-aware audio-walk, GPS-soundscape, local tourist guide or urban game with these Pure Data patches and authoring tool using MobMuPlat on your phone.

  • GSound: Interactive Sound Propagation for Games - We present a sound propagation and rendering system for generating realistic environmental acoustic effects in real time for game-like scenes. The system uses ray tracing to sample triangles that are visible to a listener at an arbitrary depth of reflection. Sound reflection and diffraction paths from each sound source to the listener are then validated using ray-based occlusion queries. Frame-to-frame caching of propagation paths is performed to improve the consistency and accuracy of the output. Furthermore, we present a flexible framework, which takes a small fraction of CPU cycles for time-critical scenarios. To the best of our knowledge, this is the first practical approach that can generate realistic sound and auralization for games on current platforms.

to sort

  • - FDTD or Yee's method (named after the Chinese American applied mathematician Kane S. Yee, born 1934) is a numerical analysis technique used for modeling computational electrodynamics (finding approximate solutions to the associated system of differential equations). Since it is a time-domain method, FDTD solutions can cover a wide frequency range with a single simulation run, and treat nonlinear material properties in a natural way.