Meta unveils AI-based Hokkien speech translation system

0
Meta unveils AI-based Hokkien speech translation system

More than 40% of the world’s 7,000 living languages are transmitted orally from generation to generation and have no formal writing system. Meta, which aims to develop a universal speech translator, introduces a new AI-based speech-to-speech translation system dedicated to Hokkien, a mainly oral language widely spoken in the Chinese diaspora. It allows to translate in real time a sentence in Hokkien into English and vice versa.

AI to eliminate language barriers

AI machine translation systems do not cover thousands of languages, so more than 20% of the world’s population cannot use them from their native language. The scarcity of data for these languages is an obstacle, as learning is generally done from millions of sentences. The challenge for direct oral translation is even greater.

Meta is addressing these challenges with the Universal Speech Translator project, announced last February at the Meta AI “Inside the Lab” virtual event, which aims to build systems that translate speech directly from one language to another in real time without the need for a standard writing system.

Mark Zuckerberg, founder and CEO of Meta, said at the time:

“The ability to communicate with anyone in any language – that’s a superpower that people have always dreamed of, and AI will provide that in our lifetime.”

Overcoming the problem of lack of data

The Meta Ai researchers faced two problems: there is no widely known standard writing system for Hokkien, and on their end, the available matched speech data is very sparse compared to universal languages like Spanish or English. In addition, translators from English to Hokkien are relatively scarce, making it difficult to collect and annotate data for model training.

To overcome these problems, the researchers used text written in Mandarin, a language that offers many similarities to Hokkien and has a large amount of resources. They worked closely with Hokkien speakers to ensure that the translations were correct.

Juan Pito, researcher at Meta, states:

“Our team first translated English or Hokkien speech into Mandarin text, then translated it into Hokkien or English – both with human annotators and automatically. They then added the matched sentences to the data used to train the AI model.”

A speech-to-speech translation system

In a recent study, a Meta team had applied a self-supervised discrete speech encoder to a target speech and trained a sequence-to-speech-sequence translation (S2UT) model to predict discrete representations of the target speech.

The researchers used this S2UT speech-unit translation method to directly convert speech inputs into a sequence of acoustic units. They then generated waves from the units. UnitY was used as a two-stage decoding system: the first decoder generates Mandarin text, the second creates units.

In its current phase, this approach allows Hokkien speakers to converse with English speakers. Meta says this model is still being improved, and although it can only translate one sentence at a time, ” it’s a step closer to a future where simultaneous translation between languages is possible.”

In fact, the researchers will make their model, code and reference data freely available to allow others to build on their work.

Translated from Meta dévoile un système de traduction vocale du Hokkien basé sur l’IA