Cross-Lingual Language Modeling for Nepali: Enhancing low Resource NLP

Authors

  • Daniel Soubam Department of Computer Science and Engineering Sharda University, Greater Noida, U.P., India Author
  • Vivek Gupta Department of Computer Science and Engineering Sharda University, Greater Noida, U.P., India Author
  • Aparna Sivaraj Department of Computer Science and Engineering Sharda University, Greater Noida, U.P., India Author

Keywords:

Cross-lingual, Nepali, XLM-R, Masked Language Modeling, Translation Language Modeling, low-resource NLP

Abstract

This paper looks at improving multilingual language models for Nepali, which is a language with limited resources. We take a two-step approach: first, we train the model using Masked Language Modeling (MLM) on Nepali-only text, then we continue training with Translation Language Modeling (TLM) using English-Nepali parallel texts. Our experiments show that adapting the XLM-R Base model this way helps reduce Nepali perplexity and boosts performance on tasks like machine translation, sentiment analysis, and question answering. We also share our methods and results to support future research on underrepresented South Asian languages.

Downloads

Published

13-03-2026

How to Cite

Soubam, D. . ., Gupta, V. ., & Sivaraj, A. . (2026). Cross-Lingual Language Modeling for Nepali: Enhancing low Resource NLP. DMPedia Lecture Notes in Multidisciplinary Research, IMPACT26, 430-436. https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNMR/article/view/80