[1]
S. . Singh, A. . Sharma, and S. . Tiwari, “Empirical Benchmarking of Vision-Language Transformer Combinations for Visual Question Answering Tasks”, DMP-LNCSE, no. IMPACT26, pp. 199–209, Mar. 2026, Accessed: Mar. 29, 2026. [Online]. Available: https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNCSE/article/view/144