Unsupervised Multi-Channel Separation and Adaptation

Cong Han^1*, Kevin Wilson², Scott Wisdom², John R. Hershey²

¹Columbia University ²Google Research
^*Work performed during internship at Google.

Overview

A key challenge in machine learning is to generalize from training data to a test domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.

Audio Demos

We evaluated our models on both synthetic and real mixtures from the AMI Corpus, training single- and multi-channel models on various combinations of datasets with supervised permutation invariant training (PIT) and unsupervised MixIT.

Results on synthetic AMI

	Example 1		Example 2		Example 3
Noisy Mixture
1 Microphone
Sup. Synth AMI
Unsup. AMI
YFCC100M Warm Start
Sup. Synth AMI, Unsup. AMI
Sup. Synth AMI, Unsup. AMI, YFCC100M Warm Start
4 Microphones
Sup. Synth AMI
Unsup. AMI
Sup. Synth AMI, Unsup. AMI
Sup. Synth AMI, Unsup. AMI, YFCC100M Warm Start
Reference Audio
Headset Filtered to Distant Mic
Headset
	Source 1	Source 2	Source 1	Source 2	Source 1	Source 2

Results on real AMI

	Example 1		Example 2		Example 3
Noisy Mixture
1 Microphone
Sup. Synth AMI
Unsup. AMI
YFCC100M Warm Start
Sup. Synth AMI, Unsup. AMI
Sup. Synth AMI, Unsup. AMI, YFCC100M Warm Start
4 Microphones
Sup. Synth AMI
Unsup. AMI
Sup. Synth AMI, Unsup. AMI
Sup. Synth AMI, Unsup. AMI, YFCC100M Warm Start
Reference Audio
Headset Filtered to Distant Mic
Headset
	Source 1	Source 2	Source 1	Source 2	Source 1	Source 2