Public Sector arrow Untagged arrow Post

Sri Lanka’s Multilingual AI Initiative: The ‘Sri Lankan Open Language Initiative for Data’ (SOLID)

janathb@icta.lk

janathb@icta.lk

GIC

Share Post On:

Sri Lanka’s Information and Communication Technology Agency (ICTA) is leading an ambitious initiative to integrate artificial intelligence (AI) into its public services to overcome the nation’s linguistic diversity challenges. This effort, known as the ‘Sri Lankan Open Language Initiative for Data,’ (SOLID) aims to create AI systems that can interact seamlessly in Sinhala, Tamil, and English. The project leverages large language models (LLMs) for text, and speech-to-text (STT) and text-to-speech (TTS) technologies for audio.

Strategic Objectives

The primary goal is to develop a “gold standard” open-access repository of high-quality data to train AI models. This involves a coordinated national effort to digitize and translate data from government and other archives to create domain-specific datasets for Sinhala and Tamil. All outputs will be publicly available without restrictions to foster transparency and innovation. This initiative is a core component of Sri Lanka’s broader National AI Strategy, which emphasizes the responsible adoption of AI to drive economic growth and digital transformation.

A key component is the AI-Centered Trilingual Government Information Center (GIC), which will initially focus on English before expanding to include high-quality Sinhala and Tamil voice support, complete with datasets for automatic speech recognition (ASR). This strategy ensures an inclusive environment for all three languages, preventing disparities in AI benefits.

The Importance of Local Language AI

For Sri Lanka, developing AI in local languages is crucial for digital inclusion. While Sinhala and Tamil are the country’s official languages and are widely spoken, the population’s English proficiency is relatively low. This creates a significant barrier, as existing AI technologies perform well in English but poorly in low-resource languages like Sinhala and Tamil. Since global tech giants prioritize high-resource languages for their perceived economic benefits, local initiatives are essential for Sri Lankans to fully leverage modern AI technologies.

Data and Accessibility

The project’s success hinges on data. The initiative will create and publicly release Sinhala and Tamil datasets, allowing anyone to use them to build more effective AI technologies. Initial training data for machine translation systems will come from previous projects by the University of Moratuwa. Subsequently, parallel data will be automatically generated from government documents, textbooks, and news. Professional translators will be enlisted to create and curate test sets, ensuring high-quality, accurate outputs.

Economic Implications

The business implications of this initiative are significant. By building independent AI services, Sri Lanka can reduce its reliance on major corporations and empower local researchers, data scientists, and engineers. This will strengthen the national talent pool and allow the country to harness open-source advancements globally. This strategic move is expected to accelerate Sri Lanka’s digital economy, which is projected to grow to USD15 billion, with AI contributing significatly to that growth.

Ultimately, this push for multilingual AI not only enhances public service efficiency but also serves as a model for inclusive technological adoption in diverse societies, promising economic resilience and societal empowerment for Sri Lanka’s future.

 

Similar Posts