Translatotron 3: Speech to Speech Translation with Monolingual Data
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding mapping, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting 18.14 BLEU points improvement on the synthesized Unpaired-Conversational dataset. In contrast to supervised approaches that necessitate real paired data, or specialized modeling to replicate para-/non-linguistic information such as such as pauses, speaking rates, and speaker identity, Translatotron 3 showcases its capability to retain it.
Model
	
        The two training phases in the proposed approach. (1) Phase 1 uses the reconstruction loss via the auto-encoding path. (2) Phase 2 employs the reconstruction loss via back-translation.
Model Comparison Samples
This section and the following sections show samples from the Translatotron 3 model trained unsupervised, without any parallel data.
        The first audio column labeled 
Spanish-to-English (on Conversational dataset)
| Source (Spanish) | Reference (English) | Predicted (English), Source speaker | 
|---|---|---|
| Creación de nuevos escenarios legales. | Creation of new legal scenarios. | Creation of new legal scenario. | 
| Sí, creo que puedo hacer eso. | Yeah! Yeah, I think I can do that. | Yeah i think i can do this. | 
| Ver toda la discografía de Eliseo Parra. | See the whole discography of Eliseo Parra. | See the whole discography of Arteropoly. | 
| Sellos relacionados con Richard Sjöberg. | Labels related to Richard Sjöberg. | Labels related to Richard Carlyin. | 
| UTR con un par de excepciones. | UTR with a couple of exceptions. | MS with a couple of exceptions. | 
English-to-Spanish (on Conversational dataset)
| Source (English) | Reference (Spanish) | Predicted (Spanish), Source speaker | 
|---|---|---|
| So, why are you doing this? | Entonces, ¿por qué estás haciendo esto? | Bien por qué estás haciendo esto? | 
| Yeah, I know, but I need to learn. | Sí, lo sé, pero necesito aprender. | Sí lo sé pero necesito aprendeo. | 
| It is a great weight, but also it is a necessity. | Es un gran peso, pero también es una necesidad. | Es un gran peso pero también es una gran consecuencia. | 
| Check Availability at Residence Casamalfi. | compruebe la disponibilidad de Residence Casamalfi. | Compruebe la disponibilidad de residence stat. | 
| I do not care what he says. | No me importa lo que diga. | No me importa lo que le diga. | 
Spanish-to-English (on CommonVoice11 Synthesized dataset)
| Source (Spanish) | Reference (English) | Predicted (English), Source speaker | 
|---|---|---|
| Esto es una familia. y en una familia. | This is a family and in a family. | This is a family, and, in a family. | 
| Participó en la Royal Rumble, pero fue eliminado por R-Truth. | He participated in the Royal Rumble, but was eliminated by R-Truth. | He participated in the royal rumble but was eliminated by airtrue. | 
| En Diciembre, el grupo está evidentemente presente en el tradicional evento Dis Inferno. | In December the group is evidently present in the traditional event Des Infn. | In December, the group is evidently present in the traditional event Dis Inferno. | 
| Los dos ganadores disputaron la final. | The two winners disputed the final. | The two winners disputed the final. | 
| Líricamente, el álbum es muy político. | Lyrically the album is very political. | Lyrically, the album is very political. | 
English-to-Spanish (on CommonVoice11 Synthesized dataset)
| Source (English) | Reference (Spanish) | Predicted (Spanish), Source speaker | 
|---|---|---|
| Her mother, Angela, is a public servant, and her father, Tony, is a psychologist. | Su madre, Angela, es una servidora pública y su padre, Tony, es un psicólogo. | Su madre Angela es una poca sirviente y su padre Tony es un psicólogo | 
| The organization has worked in Honduras, Colombia, Venezuela, Uganda and the United States. | La organización ha trabajado en Honduras, Colombia, Venezuela, Uganda y Estados Unidos. | La organización ha trabajado en Honduras Colombia Venezuela Uganda y Estados Unidos. | 
| Practically all songs have been written by Michelle. | Prácticamente todas las canciones han sido escritas por Michelle. | Prácticamente todas las canciones han sido escritas por Miche. | 
| He attended Wellesley College, where he studied physics and astronomy. | Asistió al Wellesley College, donde estudió física y astronomía. | Asistió al Wesley college donde estudió física y astronomía. | 
| Three days in four chapters, in four stories of four friends. | Tres días en cuatro capítulos en cuatro historias de cuatro amigos. | Tres días en cuatro capítulos en cuatro historias de cuatro amigos. | 
Spanish-to-English (on CommonVoice11 dataset)
| Source (Spanish) | Reference (English) | Predicted (English), Source speaker | 
|---|---|---|
| Trabajó con orquestas en Rusia, y con músicos en Europa y los Estados Unidos. | He worked with orchestras in Russia, and with musicians in Europe and the United States. | Traveled with orchestrs in Russia and with musicians in Europe and the United States. | 
| Con ella muchas ciudades y colonias asumieron el rango de "municipium". | With it, many cities and colonies assumed the rank of “municipium”. | With him many cities and colonies assumed the rank of minicippia. | 
| Hay tres generaciones por año en el sur de Texas. | There are three generations per year in the south of Texas. | I three generations per year in the south of Texas. | 
| Fue enterrado en Madison, Wisconsin. | He was buried in Madison, Wisconsin. | He was buried in Madison Wisconsin. | 
| Algunos autores clasifican esa información como falsa. | Some authors classify that information as false. | All authors classify that information as fall. |