Unsupervised Multi-Channel Separation and Adaptation

|paper|

Cong Han1*, Kevin Wilson2, Scott Wisdom2, John R. Hershey2

1Columbia University 2Google Research
*Work performed during internship at Google.

Overview

A key challenge in machine learning is to generalize from training data to a test domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.

Audio Demos

We evaluated our models on both synthetic and real mixtures from the AMI Corpus, training single- and multi-channel models on various combinations of datasets with supervised permutation invariant training (PIT) and unsupervised MixIT.

Results on synthetic AMI

Example 1 Example 2 Example 3
Noisy Mixture
1 Microphone
Sup. Synth AMI
Unsup. AMI
YFCC100M
Warm Start
Sup. Synth AMI,
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI,
YFCC100M
Warm Start
4 Microphones
Sup. Synth AMI
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI,
YFCC100M
Warm Start
Reference Audio
Headset Filtered
to Distant Mic
Headset
Source 1 Source 2 Source 1 Source 2 Source 1 Source 2

Results on real AMI

Example 1 Example 2 Example 3
Noisy Mixture
1 Microphone
Sup. Synth AMI
Unsup. AMI
YFCC100M
Warm Start
Sup. Synth AMI,
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI,
YFCC100M
Warm Start
4 Microphones
Sup. Synth AMI
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI
Sup. Synth AMI,
Unsup. AMI,
YFCC100M
Warm Start
Reference Audio
Headset Filtered
to Distant Mic
Headset
Source 1 Source 2 Source 1 Source 2 Source 1 Source 2