AI-Based Fake News Detection Using NLP and Machine Learning Techniques: A Review

Apoorv Mahere; Ashutosh; Ayush  Srivastava; Abhay  Bhardwaj

doi:10.65890/race.v2i1.182

Authors

Apoorv Mahere Department of CSE, Galgotias College of Engineering and Technology, Uttar Pradesh, India Author
Ashutosh Department of CSE, Galgotias College of Engineering and Technology, Uttar Pradesh, India Author
Ayush Srivastava Department of CSE, Galgotias College of Engineering and Technology, Uttar Pradesh, India Author
Abhay Bhardwaj Department of CSE, Galgotias College of Engineering and Technology, Uttar Pradesh, India Author

DOI:

https://doi.org/10.65890/race.v2i1.182

Keywords:

Fake News Detection, Natural Language Processing, TF-IDF, Logistic Regression, Decision Tree, Gradient Boosting, Random Forest, Text Classification

Abstract

Digital communication channels have been agents of change, radically reshaping how information is created, engaged with, and disseminated within societies. This democratisation of content has allowed users around the world to gain greater power, but it has also led to the easy spread of artificial stories, known as fake news. These traditional fact-checking methods cannot keep up with the pace and scale of misinformation online, so there is a need for independent computational infrastructure that can automatically assess textual credibility at scale. This paper aims to provide a comprehensive theoretical framework for developing an AI-driven fake news detection system based on the principles of classical Machine Learning (ML) and Natural Language Processing (NLP). Empirical studies are excluded here, with emphasis on the theoretical, mathematical and linguistic aspects needed to design such a system. A full theoretical framework is developed from the characterisation of the datasets, text pre-processing, and the TF-IDF vectorisation. The authors examine in detail four classification algorithms: Logistic Regression, Decision Tree, Random Forest and Gradient Boosting; they are explored in terms of their mathematical formulation, their suitability to high-dimensional sparse text data and their ability to adhere to linguistic patterns typical of deceptive content. This paper does not present empirical findings but rather lays the groundwork for future experiments, performance benchmarking, and deployment in misinformation detection applications in a structured framework for conceptualization.

References

[1] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2017.

[2] X. Zhou and R. Zafarani, “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities,” ACM Computing Surveys, vol. 53, no. 5, pp. 1–40, 2020. DOI: https://doi.org/10.1145/3395046

[3] W. Y. Wang, “‘Liar, Liar Pants on Fire’: A New Benchmark Dataset for Fake News Detection,” in Proc. 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, Canada, 2017, pp. 422–426. DOI: https://doi.org/10.18653/v1/P17-2067

[4] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro, and L. de Alfaro, “Automatic Online Fake News Detection Combining Content and Social Signals,” in Proc. 22nd Conference of Open Innovations Association (FRUCT), Jyvaskyla, Finland, 2018, pp. 272–279. DOI: https://doi.org/10.23919/FRUCT.2018.8468301

[5] A. P. S. Bali, P. Bhatt, A. Ahmad, S. Ranka, and P. Rai, “Comparative Performance of Various Machine Learning Algorithms for Fake News Detection,” in Proc. ICACDS, Ghaziabad, India, 2019, pp. 279–289. DOI: https://doi.org/10.1007/978-981-13-9942-8_40

[6] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news detection on social media: A data mining perspective,” ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2018. DOI: https://doi.org/10.1145/3137597.3137600

[7] E. Monti, F. Frasca, D. Eynard, A. Mannion, and M. Bronstein, “Fake news detection on social media using geometric deep learning,” arXiv preprint arXiv:1902.06673, 2019.

[8] G. Shrivastava, P. Kumar, R. P. Ojha, P. K. Srivastava, S. Mohan, and G. Srivastava, "Defensive modeling of fake news through online social networks." IEEE Transactions on Computational Social Systems 7.5 (2020): 1159-1167. DOI: https://doi.org/10.1109/TCSS.2020.3014135

[9] V.K. Mishra, K. Sharma, V. Sharma, and G. Shrivastava, "A Machine Learning based approach to detect fake news in social media." In 2023 5th International conference on advances in computing, communication control and networking (ICAC3N), pp. 1029-1036. IEEE, 2023. DOI: https://doi.org/10.1109/ICAC3N60023.2023.10541688