Facebook Research has unveiled SeamlessM4T, an innovative translation and transcription model, heralded as the foremost all-encompassing, multimodal solution. This model significantly addresses the challenges faced by traditional speech-to-speech translation systems, setting new benchmarks in the translation domain.
Historically, speech-to-speech translation has depended on cascaded systems, using a series of subsystems to progressively perform translations. These traditional methods have been a barrier to creating scalable and high-performing unified translation systems. SeamlessM4T (Massively Multilingual & Multimodal Machine Translation) is poised to bridge this gap. It uniquely supports a wide array of functionalities, including:
- Speech-to-speech translation
- Speech-to-text translation
- Text-to-speech translation
- Text-to-text translation
- Automatic speech recognition for up to 100 languages
The model’s foundation was laid using 1 million hours of open speech audio data, leveraging self-supervised speech representations through w2v-BERT 2.0. This data foundation gave rise to the creation of SeamlessAlign, a multimodal corpus of automatically synchronized speech translations. Once combined with human-labeled and pseudo-labeled data, the system was fine-tuned, becoming a formidable multilingual solution that offers translation capabilities for both speech and text involving English.
On the Fleurs benchmark, SeamlessM4T outpaced its predecessors, with a remarkable 20% improvement in BLEU scores for direct speech-to-text translation. Comparative tests further underscored its efficiency, demonstrating a consistent lead over other cascaded models.
SeamlessM4T’s prowess doesn’t end there. Preliminary human assessments were also commendable, with translations from English achieving XSTS scores consistently above 4 on a 5-point scale. Furthermore, in ensuring translation safety, SeamlessM4T was assessed for gender bias and potential toxicity, achieving an impressive 63% reduction in toxicity compared to existing models.
In a commitment to the global research community, Facebook Research is open-sourcing a suite of resources from this project, including models, code, finetuning tools, and extensive data. All these resources are now available at Facebook Research’s GitHub repository.