Translatotron 3: Speech to Speech Translation with Monolingual Data
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding mapping, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting 18.14 BLEU points improvement on the synthesized Unpaired-Conversational dataset. In contrast to supervised approaches that necessitate real paired data, or specialized modeling to replicate para-/non-linguistic information such as such as pauses, speaking rates, and speaker identity, Translatotron 3 showcases its capability to retain it.
Model
The two training phases in the proposed approach. (1) Phase 1 uses the reconstruction loss via the auto-encoding path. (2) Phase 2 employs the reconstruction loss via back-translation.
Model Comparison Samples
This section and the following sections show samples from the Translatotron 3 model trained unsupervised, without any parallel data.
The first audio column labeled
Spanish-to-English (on Conversational dataset)
Source (Spanish) | Reference (English) | Predicted (English), Source speaker |
---|---|---|
Creación de nuevos escenarios legales. | Creation of new legal scenarios. | Creation of new legal scenario. |
Sí, creo que puedo hacer eso. | Yeah! Yeah, I think I can do that. | Yeah i think i can do this. |
Ver toda la discografía de Eliseo Parra. | See the whole discography of Eliseo Parra. | See the whole discography of Arteropoly. |
Sellos relacionados con Richard Sjöberg. | Labels related to Richard Sjöberg. | Labels related to Richard Carlyin. |
UTR con un par de excepciones. | UTR with a couple of exceptions. | MS with a couple of exceptions. |
English-to-Spanish (on Conversational dataset)
Source (English) | Reference (Spanish) | Predicted (Spanish), Source speaker |
---|---|---|
So, why are you doing this? | Entonces, ¿por qué estás haciendo esto? | Bien por qué estás haciendo esto? |
Yeah, I know, but I need to learn. | Sí, lo sé, pero necesito aprender. | Sí lo sé pero necesito aprendeo. |
It is a great weight, but also it is a necessity. | Es un gran peso, pero también es una necesidad. | Es un gran peso pero también es una gran consecuencia. |
Check Availability at Residence Casamalfi. | compruebe la disponibilidad de Residence Casamalfi. | Compruebe la disponibilidad de residence stat. |
I do not care what he says. | No me importa lo que diga. | No me importa lo que le diga. |
Spanish-to-English (on CommonVoice11 Synthesized dataset)
Source (Spanish) | Reference (English) | Predicted (English), Source speaker |
---|---|---|
Esto es una familia. y en una familia. | This is a family and in a family. | This is a family, and, in a family. |
Participó en la Royal Rumble, pero fue eliminado por R-Truth. | He participated in the Royal Rumble, but was eliminated by R-Truth. | He participated in the royal rumble but was eliminated by airtrue. |
En Diciembre, el grupo está evidentemente presente en el tradicional evento Dis Inferno. | In December the group is evidently present in the traditional event Des Infn. | In December, the group is evidently present in the traditional event Dis Inferno. |
Los dos ganadores disputaron la final. | The two winners disputed the final. | The two winners disputed the final. |
Líricamente, el álbum es muy político. | Lyrically the album is very political. | Lyrically, the album is very political. |
English-to-Spanish (on CommonVoice11 Synthesized dataset)
Source (English) | Reference (Spanish) | Predicted (Spanish), Source speaker |
---|---|---|
Her mother, Angela, is a public servant, and her father, Tony, is a psychologist. | Su madre, Angela, es una servidora pública y su padre, Tony, es un psicólogo. | Su madre Angela es una poca sirviente y su padre Tony es un psicólogo |
The organization has worked in Honduras, Colombia, Venezuela, Uganda and the United States. | La organización ha trabajado en Honduras, Colombia, Venezuela, Uganda y Estados Unidos. | La organización ha trabajado en Honduras Colombia Venezuela Uganda y Estados Unidos. |
Practically all songs have been written by Michelle. | Prácticamente todas las canciones han sido escritas por Michelle. | Prácticamente todas las canciones han sido escritas por Miche. |
He attended Wellesley College, where he studied physics and astronomy. | Asistió al Wellesley College, donde estudió física y astronomía. | Asistió al Wesley college donde estudió física y astronomía. |
Three days in four chapters, in four stories of four friends. | Tres días en cuatro capítulos en cuatro historias de cuatro amigos. | Tres días en cuatro capítulos en cuatro historias de cuatro amigos. |
Spanish-to-English (on CommonVoice11 dataset)
Source (Spanish) | Reference (English) | Predicted (English), Source speaker |
---|---|---|
Trabajó con orquestas en Rusia, y con músicos en Europa y los Estados Unidos. | He worked with orchestras in Russia, and with musicians in Europe and the United States. | Traveled with orchestrs in Russia and with musicians in Europe and the United States. |
Con ella muchas ciudades y colonias asumieron el rango de "municipium". | With it, many cities and colonies assumed the rank of “municipium”. | With him many cities and colonies assumed the rank of minicippia. |
Hay tres generaciones por año en el sur de Texas. | There are three generations per year in the south of Texas. | I three generations per year in the south of Texas. |
Fue enterrado en Madison, Wisconsin. | He was buried in Madison, Wisconsin. | He was buried in Madison Wisconsin. |
Algunos autores clasifican esa información como falsa. | Some authors classify that information as false. | All authors classify that information as fall. |