Audio samples for "SpeechPainter: Text-conditioned Speech Inpainting"

Zalán Borsos, Matt Sharifi, Marco Tagliasacchi

Capabilities of SpeechPainter

SpeechPainter receives as input an utterance containing a gap of at most one second and the transcript. It must learn to identify the part of the transcript corresponding to the utterance (dark gray) and the gap (bold), and fill in the gap with the correct content, while maintaining speaker identity, prosody and recording environment conditions. The following samples for unseen speakers are randomly selected from LibriTTS (test-clean and test-other splits) and VCTK.

For the inpainted gap, the model can maintain:	Input Sample (transcript, gap start, gap length)	SpeechPainter
speaker id and prosody	\|George Robertson desc\|ribed the plan \|as an outrageous and disg\|raceful decision. 0.89, 0.84
speaker id and prosody	\|I am very sorry that Alastair \|Campbell has taken \|this decision.\| 1.40, 0.84
background noise	His recovery was destined to be almost as sudden \|as his disappearance, and was due\| directly \|to the tra\|mp Alex had brought to Sunnyside. 1.83, 0.75
background noise	The co\|mparative adva\|ntages of complian\|ce and non-compliance are as follows:\| 0.39, 0.83
bandpass filtering due to the microphone	But, as I suppose your Excellency is joking, I will add no more. \|He replied that, f\|ar from\| joking,\| he meant solemn earnest. 1.83, 0.75
bandpass filtering due to the microphone	"You should go to bed," she replied,\| with that ironical air \|which went\| so well\| with her delicate and witty face. 1.70, 0.95
reverberation	He went \|to the window\| and sat down,\| scanning the groups,\| and listening to what was being said around him. 0.65, 0.95
reverberation	\|We shall say no more, but I tru\|st you understand\| the responsibility\| you have? 1.40, 0.84

Use-cases

The samples were collected by the authors.

Use-case	Input Sample	SpeechPainter
Grammar correction	Tony and me went to the store.	Tony and I went to the store.
Grammar correction	We swimmed in the river last weekend, and the water was cold.	We swam in the river last weekend, and the water was cold.
Grammar correction	Yesterday, I gone to the store and I bought some milk.	Yesterday, I went to the store and I bought some milk.
Pronunciation correction	Yesterday I bought a very nice watch.
Pronunciation correction	I would like the steak cooked medium-rare.
Pronunciation correction	Then, I suddenly shifted into third gear on the highway.
Background noise removal	Now, suddenly, we have this new landscape.
Background noise removal	The quick brown fox jumped over the lazy dog.
Packet loss correction	SpeechPainter can also be used for correcting packet losses.

Failure modes

SpeechPainter is NOT robust to all:	Input Sample (transcript, gap start, gap length)	Target	SpeechPainter
speakers	The captain behaved perfectly well in this critical instant, commanding a \|dead sile\|nce, and the closes\|t attention to his orders.\| 0.65, 0.95
background noises	Mr. \|Bell said \|they absolutely li\|ved upon water-porridge\| for years--how, he did not know; but long after the creditors had given up hope of any payment of old Mr. Thornton's debts (if, indeed, they ever had hoped at all about it, after his suicide,) this young man returned to Milton, and went quietly round to each creditor, paying him the first instalment of the money owing to him. 0.76, 1.00
content	\|The project has already secured the sup\|port of Sir Sean\| Connery.\| 1.72, 0.95
speech tempo	\|Hopefully, he will be \|back\| in contention next week.\| 1.00, 1.00

More samples from LibriTTS test splits and VCTK (24 kHz)

Sample ID	Input Sample (transcript, gap start, gap length)	Target Sample	SpeechPainter
libritts_test_clean 7021_85628_000019_000000	But just at this instant the Princ\|ess came trip\|ping across the y\|ard. She was dresse\|d in white silk with bows of ribbon. 0.62, 0.94
libritts_test_clean 2300_131720_000004_000012	Bur\|bank and his tribe represe\|nt in the vegetable\| worl\|d, Edison in the mechanical. 1.58, 0.95
libritts_test_clean 1580_141083_000012_000000	"The first page on the \|floor, \|the second in the win\|dow, the third where you left \|it," said he. 0.64, 1.00
libritts_test_clean 5639_40744_000020_000002	The boy having embraced his mother, calling her his cousin, \|and his grandmother,\| calling her his\| benefactress,\| repeated his grandfather's question. 1.43, 0.82
libritts_test_clean 5142_36586_000010_000000	\|EFFECTS OF \|THE INCREASED USE \|AND DISUSE OF PARTS.\| 0.47, 1.00
libritts_test_other 4198_61336_000001_000002	The national exchequer had \|been exhausted by the l\|oss of tri\|bute from revolt\|ing provinces, trade was paralysed, and the industries were in a languishing condition. 1.40, 0.93
libritts_test_other 533_1066_000015_000000	The doctor was puffing \|somewhat \|when we\| finally came to a halt\|. 0.64, 0.85
libritts_test_other 3080_5040_000000_000006	Would it would leave me, and th\|en I could believe I sh\|all not always have\| occasio\|n for it. 1.72, 0.95
libritts_test_other 3005_163390_000002_000000	WELL, all day him and the king was hard at it, \|ri\|gging up a sta\|ge and a curtain and a row of ca\|ndles for footlights; and that night the house was jam full of men in no time. 0.47, 1.00
libritts_test_other 4294_35475_000021_000001	But remember that thou canst not keep them sharp \|and shining, unless they are u\|sed at least on\|ce each\| day in some unselfish service." 1.58, 0.95
vctk p277_341	\|A spokeswoman for Edin\|burgh City Counc\|il confirmed it\|s support for the company. 0.99, 0.75
vctk p272_008	These take the shape of a \|long round a\|rch, with its \|path high above, and its two\| ends apparently beyond the horizon. 0.47, 0.78
vctk p268_028	\|Wages and salaries account fo\|r a major prop\|ortion of our \|expenses. 1.40, 0.93
vctk p245_081	It just \|shows the arrogance \|of Labour\| and the Liberal Dem\|ocrats. 1.40, 0.84
vctk p265_180	Any \|change would b\|e subject\| to the Scottish Parliament's \|approval. 0.64, 0.85

Subset of samples used for the listening study (16 kHz)

Sample ID	Input Sample (transcript, gap start, gap length)	Target Sample	TTS	SpeechPainter
libritts_test_clean 1580_141083_000010_000000	"The mom\|ent I looked at my table, I was aware\| that someone had rumm\|aged among my pa\|pers. 1.58, 0.83
libritts_test_clean 7127_75946_000040_000001	Suddenly, for the purpose of restoring peace and ord\|er, Spring,\| accompanied by\| his whole court,\| made his appearance. 1.28, 0.85
libritts_test_clean 5683_32866_000012_000001	But don't these \|very wi\|se things\| sometimes turn out very fool\|ishly? 0.38, 0.78
libritts_test_clean 4970_29095_000006_000000	\|I wanted to talk with thee a \|little about thy\| plans.\| 1.44, 0.93
libritts_test_clean 7176_88083_000015_000000	The hawk sat u\|pon the branch and watched his qua\|rry swimming\| beneath the surfa\|ce. 1.55, 0.76
libritts_test_other 5484_24318_000068_000001	He \|was appearing before his c\|ompanions\| only to give truth\| its just due. 1.05, 0.89
libritts_test_other 5764_299665_000104_000000	The third stone is that matter and force cannot exist ap\|art--no matter \|without f\|orce--n\|o force without matter. 1.55, 0.76
libritts_test_other 4294_35475_000036_000003	My friend, \|the Fly, sent me \|to guide you\| to a place of safety.\| 0.98, 0.90
libritts_test_other 7902_96592_000061_000002	\|Why, your clothes don't fit you, \|and your cap's put on \|all skew-rew.\| 1.36, 0.92
libritts_test_other 6128_63240_000013_000002	He threw it down at the approach of Mrs. Luna, laughed, shook hands with her, and said in answer to her last re\|mark, "You \|imply that \|you do tell fi\|bs. 1.36, 0.78
vctk p271_344	A form\|al announcement is expecte\|d this morning at\| a news conference.\| 1.31, 0.98
vctk p306_234	\|Now, suddenly, w\|e have this \|new landsc\|ape. 1.68, 0.85
vctk p286_119	\|Our offer represents an attra\|ctive price for \|the busine\|ss. 1.76, 0.96
vctk p228_032	\|We must provide \|a long-term so\|lution to tackle this atti\|tude. 0.84, 0.83
vctk p311_004	We also \|need a small plastic sna\|ke and a big\| toy frog for the ki\|ds. 1.24, 0.87

Ablation for adversarial training

Sample ID	Input Sample (transcript, gap start, gap length)	Target Sample	SpeechPainter w/o Adversarial Training	SpeechPainter
libritts_test_clean 8463_294825_000022_000002	Yet this loose structure gi\|ves the n\|ovel \|an air of documentary realism.\| 0.28, 0.91
libritts_test_clean 5639_40744_000000_000004	But Rodolfo had been struck by the great beauty of Leocadia, the hidalgo's daughter, \|and present\|ly he began to entertain the idea\| of enjoying it at all hazards. 0.38, 0.90
libritts_test_clean 2300_131720_000016_000013	And\|rews skipped from under; \|he obeyed orders;\| I did\| not. 1.36, 0.96
libritts_test_clean 4970_29095_000008_000000	"I know," said Margaret Bolton, with a half anxious smile,\| "thee chafe\|s against all the \|ways of Friend\|s, but what will thee do? 1.08, 0.96
libritts_test_clean 7176_88083_000023_000002	At the same mo\|ment he felt the light \|restraint of the al\|most invisible \|leader upon his wings, where the other two flies had affixed themselves. 1.11, 0.99
libritts_test_clean 7021_79740_000007_000000	Now Delia \|contrived to \|obtain a great \|influence and ascende\|ncy over the minds of the children by means of these dolls. 0.66, 0.95
libritts_test_clean 8555_284447_000035_000001	Then she \|seated her\|self in an out-of-\|the-way place and quietly wai\|ted. 0.43, 0.88
libritts_test_clean 8230_279154_000013_000001	There may be a specific feeling which could be called the f\|eeling of "pa\|stness,"\| especially where immediate mem\|ory is concerned. 0.54, 0.92
libritts_test_clean 7729_102255_000010_000014	\|They were not only well armed \|and supplied,\| but wrought\| up to the highest pitch of partisan excitement. 1.31, 0.93
libritts_test_clean 237_134500_000003_000002	She \|hated to see the Sunday new\|spapers come in\|to the house.\| 1.29, 1.00