Distance-Based Sound Separation

Katharine Patterson

Abstract

We propose the novel task of distance-based sound separation, where sounds are separated based only on their distance from a single microphone. In the context of assisted listening devices, proximity provides a simple criterion for sound selection in noisy environments that would allow the user to focus on sounds relevant to a local conversation. We demonstrate the feasibility of this approach by training a neural network to separate near sounds from far sounds in single channel synthetic reverberant mixtures, relative to a threshold distance defining the boundary between near and far. With a single nearby speaker and four distant speakers, the model improves scale-invariant signal to noise ratio by 4.4 dB for near sounds and 6.8 dB for far sounds.

Paper

"Distance-Based Sound Separation",
Katharine Patterson, Kevin Wilson, Scott Wisdom, and John R. Hershey,
Proc. Interspeech, September 2022, Incheon, Korea.

[PDF]

Real data audio demo

This example was recorded with an iPhone in a room of size approximately 4m by 5m, with near speaker approximately 0.5m from the microphone and the far speaker approximately 2m from the microphone. The example was processed by a model trained as described in the paper with training data source presence probability of 0.5.

Mixture	Near estimate	Far estimate

Synthetic mixtures audio demos

The examples below were generated using the procedure described in the paper. In all cases the example audio is processed by two trained models: one trained with source presence probability 1.0, and the other trained with source presence probability 0.5.

Examples with source presence probability of 0.5, with zero to five sources per example.

Examples with source presence probability of 1.0, with five sources per example.

Last updated: June 2022