SinLlama: Sri Lanka’s First Sinhala Large Language

 SinLlama: Sri Lanka’s First Sinhala Large Language

In a landmark achievement for local language technology, researchers at the University of Moratuwa have developed Sri Lanka’s very first large-scale Sinhala-only Large Language Model (LLM). Named SinLlama, this groundbreaking model marks a new chapter in making artificial intelligence more accessible and relevant to Sri Lankan communities.

Building SinLlama

SinLlama was created by extending Llama-3-8B, a leading open-source multilingual LLM. The research team enhanced the model’s tokenizer with Sinhala-specific vocabulary and continually pre-trained it on a 10 million–sentence Sinhala corpus, carefully cleaned and curated for accuracy.

The project was led by the Department of Computer Science and Engineering at the University of Moratuwa. According to the team, SinLlama is now the largest Sinhala LLM to date, and importantly, the first open-source decoder-based model with explicit Sinhala support.

Outperforming Global Models

When benchmarked on three Sinhala text classification tasks, SinLlama significantly outperformed both the base and instruction-tuned versions of Llama-3-8B. This performance boost demonstrates that with dedicated training and language-specific adaptation, global AI frameworks can be localized to excel in low-resource languages like Sinhala.

A Step Forward for Local Innovation

Both the model and the 10 million–sentence dataset have been made freely available for researchers, developers, and innovators. Officials said the release of SinLlama and its dataset is expected to support wider research and innovation, ensuring that Sri Lankan languages thrive in the emerging AI era rather than being left behind.

By making these resources open, the University of Moratuwa ensures that Sri Lanka is not just a consumer of AI technology but also a contributor to global knowledge and development.

Access and Impact

With SinLlama, innovators can now explore opportunities to build AI-powered educational tools, content platforms, translation systems, and localized digital services in Sinhala. This is a step toward a future where AI speaks the language of the people—and empowers them in the process.

🔗 Access the dataset here: lnkd.in/gi43HaXg


Proudly Sri Lankan celebrates SinLlama as a shining example of homegrown innovation—proof that even in the global AI race, Sri Lanka is making its mark.

Related post

Leave a Reply

Your email address will not be published. Required fields are marked *