Multilingual data is essential for enabling global conversational AI. Rich datasets are used to build systems that can deliver experiences that feel natural in a target culture. However, datasets often focus too heavily on “standard” language usage and don’t take into account local market realities and the rich variation in human language production. In this talk, Aaron Schliem, Senior AI Solutions Architect at Welocalize, will offer insights to help ML and AI teams source datasets that are more representative of local cultural realities.
Despite Arabic being one of the most spoken languages of the world, and one of the six official languages of the UN, language technology for Arabic is many times way behind other European languages. Arabic language technology has not been able to profit from the resources available for many much smaller European languages, and there are very few tools available even for more fundamental NLP tasks such as segmentation.
Telephone and Video Interpreting have been around for a few decades, enabling consecutive interpreters to work remotely. More recently, several credible Remote Simultaneous Interpreting (RSI) platforms have appeared in the market, allowing also conference interpreters to deliver their service without having to be present at the venue. Avoiding interpreter booths and reducing travel and accommodation expenses has dramatically reduced the total delivery costs of simultaneous interpreting.