Writy.
  • Home
No Result
View All Result
Writy.
  • Home
No Result
View All Result
The AGI News
No Result
View All Result

SeamlessM4T: Facebook Research’s Groundbreaking Multimodal Translation Model

August 24, 2023
Multimodal Translation System
Share on FacebookShare on Twitter

Facebook Research has unveiled SeamlessM4T, an innovative translation and transcription model, heralded as the foremost all-encompassing, multimodal solution. This model significantly addresses the challenges faced by traditional speech-to-speech translation systems, setting new benchmarks in the translation domain.

Historically, speech-to-speech translation has depended on cascaded systems, using a series of subsystems to progressively perform translations. These traditional methods have been a barrier to creating scalable and high-performing unified translation systems. SeamlessM4T (Massively Multilingual & Multimodal Machine Translation) is poised to bridge this gap. It uniquely supports a wide array of functionalities, including:

  • Speech-to-speech translation
  • Speech-to-text translation
  • Text-to-speech translation
  • Text-to-text translation
  • Automatic speech recognition for up to 100 languages

The model’s foundation was laid using 1 million hours of open speech audio data, leveraging self-supervised speech representations through w2v-BERT 2.0. This data foundation gave rise to the creation of SeamlessAlign, a multimodal corpus of automatically synchronized speech translations. Once combined with human-labeled and pseudo-labeled data, the system was fine-tuned, becoming a formidable multilingual solution that offers translation capabilities for both speech and text involving English.

On the Fleurs benchmark, SeamlessM4T outpaced its predecessors, with a remarkable 20% improvement in BLEU scores for direct speech-to-text translation. Comparative tests further underscored its efficiency, demonstrating a consistent lead over other cascaded models.

SeamlessM4T’s prowess doesn’t end there. Preliminary human assessments were also commendable, with translations from English achieving XSTS scores consistently above 4 on a 5-point scale. Furthermore, in ensuring translation safety, SeamlessM4T was assessed for gender bias and potential toxicity, achieving an impressive 63% reduction in toxicity compared to existing models.

In a commitment to the global research community, Facebook Research is open-sourcing a suite of resources from this project, including models, code, finetuning tools, and extensive data. All these resources are now available at Facebook Research’s GitHub repository.

Related News

artificial intelligence and neuroscience

Integration of LLMs and Neuroimaging Sheds Light on Cognitive Processes in Reading Comprehension

September 28, 2023
RankVicuna

Researchers Introduce RankVicuna, An Open-Source Model Elevating Zero-Shot Reranking in Information Retrieval

September 27, 2023
CS1 Coding Tasks and Learning Trajectories

LLM-Based Code Generators on CS1 Coding Tasks and Learning Trajectories

September 26, 2023
Speech Data Processing

Speech Technology with Tencent AI Lab’s AutoPrep for Optimal Unstructured Speech Data Processing

September 26, 2023
Load More
Next Post
Visual Language Model Introduce

IDEFICS: A Transparent, Open-Access Visual Language Model Introduced, Challenging DeepMind's Flamingo

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

© 2023 AGI News All Rights Reserved.

Contact: community@superagi.com

No Result
View All Result
  • Home

Sign up for Newsletter