Cross-Lingual Language Modeling for Nepali: Enhancing low Resource NLP
Keywords:
Cross-lingual, Nepali, XLM-R, Masked Language Modeling, Translation Language Modeling, low-resource NLPAbstract
This paper looks at improving multilingual language models for Nepali, which is a language with limited resources. We take a two-step approach: first, we train the model using Masked Language Modeling (MLM) on Nepali-only text, then we continue training with Translation Language Modeling (TLM) using English-Nepali parallel texts. Our experiments show that adapting the XLM-R Base model this way helps reduce Nepali perplexity and boosts performance on tasks like machine translation, sentiment analysis, and question answering. We also share our methods and results to support future research on underrepresented South Asian languages.Downloads
Published
13-03-2026
Conference Proceedings Volume
Section
Articles
License
Copyright (c) 2026 DMPedia Lecture Notes in Multidisciplinary Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Soubam, D. . ., Gupta, V. ., & Sivaraj, A. . (2026). Cross-Lingual Language Modeling for Nepali: Enhancing low Resource NLP. DMPedia Lecture Notes in Multidisciplinary Research, IMPACT26, 430-436. https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNMR/article/view/80