With advances in language processing and natural language processing, there is hope that one day you will be able to ask your virtual assistant about the best salad ingredients. It is currently possible to request your home gadget to play music or to open it by voice command, a function that is already present in many devices.
If you speak Moroccan, Algerian, Egyptian, Sudanese, or any of the other dialects of the Arabic language, which vary immensely from region to region, and some of them don’t understand each other, this is a different story. If your first language is Arabic, Finnish, Mongolian, Navajo, or any other language with a high level of morphological complexity, you may feel left out.
These complex constructs fascinated Ahmed Ali to find a solution. He is the chief engineer of the Arabic Language Technologies Group at the Qatar Computing Research Institute (QCRI) – part of the Hamad Bin Khalifa University of the Qatar Foundation and the founder of ArabicSpeech, a “Community That Exists for the Benefit of Arabic Linguistics and Language Technologies”. . “
Ali was intrigued by the idea of talking to cars, gadgets, and gadgets many years ago at IBM. “Can we build a machine that understands different dialects – an Egyptian pediatrician who automates a recipe, a Syrian teacher who helps children get the most important parts of their lessons, or a Moroccan cook who describes the best couscous recipe? “He lays down. However, the algorithms that power these machines cannot search through the 30 or so Arabic variants, let alone understand them. Most speech recognition tools these days only work in English and a handful of other languages.
The coronavirus pandemic has further fueled an already increasing reliance on language technologies, with the ways natural language processing technologies have helped people comply with home stay guidelines and physical distancing measures. However, while we use voice commands to aid e-commerce purchases and manage our households, the future holds even more applications in store.
Millions of people worldwide use Massive Open Online Courses (MOOC) for open access and unlimited participation. Speech recognition is one of the main functions of MOOC, allowing students to search the spoken content of the courses within specific areas and enable translations via subtitles. Speech technology enables lectures to be digitized to display spoken words as text in university classrooms.
According to a recent article in Speech Technology magazine, the speech and speech recognition market is projected to reach $ 26.8 billion by 2025 as millions of consumers and businesses around the world rely on voice bots, not just for interact with their devices or cars, but also to improve customer service, drive innovation in healthcare, and improve accessibility and inclusivity for people with hearing, speech, or motor disabilities.
In a 2019 survey, Capgemini predicted that by 2022, more than two in three consumers would choose voice assistants rather than visits to shops or bank branches; a proportion that could rightly increase, given the domestic, physically distant life and trade that the epidemic has forced upon the world for more than a year and a half.
Nevertheless, these devices cannot deliver to large parts of the world. For these 30 types of Arabic and millions of people, this is a vastly missed opportunity.
Arabic for machines
English- or French-speaking voice bots are far from perfect. However, teaching Arabic to machines is particularly difficult for several reasons. These are three generally recognized challenges:
- Lack of diacritical marks. Arabic dialects are vernacular as spoken in the first place. Most of the available text is not diacritized, ie there are no accents like the acute (´) or the engraving (`), which indicate the sound values of letters. Hence, it is difficult to determine where the vowels are going.
- Lack of resources. There is a lack of labeled dates for the various Arabic dialects. Overall, they lack standardized orthographic rules that dictate how a language is spelled, including norms or spelling, hyphenation, word breaks, and emphasis. These resources are critical in training computer models, and the fact that there are too few of them has hampered the development of Arabic speech recognition.
- Morphological Complexity. Arabic speakers do a lot of code switching. For example, in the areas colonized by the French – North Africa, Morocco, Algeria and Tunisia – the dialects contain many French loanwords. As a result, there are a large number of so-called words outside of the vocabulary that speech recognition technologies cannot fathom because these words are not Arabic.
“But the field moves at lightning speed,” says Ali. It is a joint effort by many researchers to advance it even faster. Ali’s Arabic Language Technology Laboratory is leading the ArabicSpeech project to bring Arabic translations together with dialects native to each region. For example, Arabic dialects can be broken down into four regional dialects: North African, Egyptian, Gulf, and Levantine. However, since dialects do not adhere to any limits, this can be as finely graduated as one dialect per city; For example, a native Egyptian speaker can distinguish between his Alexandrian dialect and his fellow citizen from Aswan (1000 kilometers away on the map).
Building a tech-savvy future for everyone
At this point, machines are about as accurate as human transcribers, largely thanks to advances in deep neural networks, a branch of machine learning in artificial intelligence based on algorithms inspired by the biological and functional functioning of the human brain are. Until recently, however, speech recognition was a bit hacked. The technology has historically relied on various modules for acoustic modeling, pronunciation dictionary building, and language modeling. all modules that have to be trained separately. More recently, researchers have trained models that convert acoustic features directly into textual transcriptions, potentially optimizing all parts for the end task.
Despite these advances, Ali still cannot give voice commands in his native Arabic to most devices. “It’s 2021 and I still can’t speak my dialect to many machines,” he says. “I mean, now I have a device that can understand my English, but the machine recognition of the Arabic language with multiple dialects has not yet taken place.”
Making this possible is at the center of Ali’s work, which culminated in the first transformer for Arabic language recognition and its dialects; one that has achieved unprecedented performance. The technology known as the QCRI Advanced Transcription System is currently used by broadcasters Al-Jazeera, DW and BBC to transcribe content online.
There are several reasons Ali and his team are now successfully developing these speech engines. First and foremost, he says: “There is a need to provide resources for all dialects. We need to build the resources to then train the model. ”Advances in computational processing mean that computationally intensive machine learning is now done on a graphics processing unit that can process and display complex graphics quickly. Ali says: “We have great architecture, good modules and data that represent reality.”
Researchers at QCRI and Kanari AI recently developed models that can achieve human parity on Arabic news broadcasts. The system demonstrates the effects of subtitling Aljazeera’s daily reports. While the human error rate (HER) in English is around 5.6%, the research found that due to the morphological complexity of the language and the lack of standard orthographic rules in dialect Arabic, the Arabic HER is significantly higher and can reach 10%. Thanks to recent advances in deep learning and end-to-end architecture, the Arabic speech recognition engine manages to outperform native speakers in broadcast messages.
While speech recognition seems to work well in modern standard Arabic, researchers at QCRI and Kanari AI are busy testing the limits of dialectal processing and getting great results. Since nobody speaks modern Standard Arabic at home, we have to pay attention to the dialect so that our language assistants understand us.
This content was authored by the Qatar Computing Research Institute at Hamad Bin Khalifa University, a member of the Qatar Foundation. It was not written by the editorial staff of the MIT Technology Review.