SINGH, Satyam; SHARMA, Aditya; TIWARI, Smita. Empirical Benchmarking of Vision-Language Transformer Combinations for Visual Question Answering Tasks. DMPedia Lecture Notes in Computer Science & Engineering, U.P., India, n. IMPACT26, p. 199–209, 2026. Disponível em: https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNCSE/article/view/144. Acesso em: 30 jun. 2026.