Singh, S. ., Sharma, A. ., & Tiwari, S. . (2026). Empirical Benchmarking of Vision-Language Transformer Combinations for Visual Question Answering Tasks. DMPedia Lecture Notes in Computer Science & Engineering, IMPACT26, 199-209. https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNCSE/article/view/144