Over a billion people speak languages not currently covered by commercial translation technologies, including spoken languages without a standard writing system.
Meta (formerly the Facebook company) has announced a long-term effort to build machine translation (MT) tools to address that.
Israeli researchers are among those who have developed the breakthrough technology.
MT systems for text translations typically rely on learning from millions of sentences of annotated data. As a result, high-quality translations have only been developed for the handful of languages that dominate the web. Expanding to other ones means finding ways to acquire and use training examples from languages with sparse web presences.
In a recent video, Mark Zuckerberg, Meta’s CEO, demonstrated the new translation technology with Hokkien, a spoken language without a standard writing system that is spoken by tens of millions of people in China, Taiwan, Singapore, Malaysia and the Philippines.
Hokkien speakers will be able to speak in their native language, and non-speakers and will hear their words in English, and vice versa, without the use of written text.
“Hokkien is spoken by millions of people, but since there’s no standard writing system, that makes it pretty challenging to build a translation system like this,” Zuckerberg said in the video in English. His words were instantly translated to Hokkien.
“That’s right, Hokkien was not taught in schools when I was a kid,” said Ping Jen, one of the researchers on the project in the video. He spoke in Hokkien, and the AI translated back to English. “It is passed down orally from generation to generation.”
Meta has announced two new projects. The first is No Language Left Behind – building AI that can learn from languages with fewer examples to train from to enable expert-quality translations in hundreds of languages, ranging from Asturian to Urdu.
The second is Universal Speech Translator – designing novel approaches to translating from speech in one language to another in real time to support languages without a standard writing system, as well as those that are both written and spoken.