Anomalous Activity Recognition Using Pose Estimation and Incremental Learning

Nikam Prathmesh Sunil; Kummandas Meena; Akash Sinha

doi:10.65890/race.v2i1.164

Authors

Nikam Prathmesh Sunil Center for Artificial Intelligence , Maulana Azad National Institute of Technology Bhopal, India Author
Kummandas Meena Center for Artificial Intelligence , Maulana Azad National Institute of Technology Bhopal, India Author
Akash Sinha Department of Computer Science and Engineering, Maulana Azad National Institute of Technology Bhopal, India Author

DOI:

https://doi.org/10.65890/race.v2i1.164

Keywords:

Human Action Recognition, Anomaly Detection , YOLO Pose, One-Class SVM, Incremental Learning, Real-time Surveillance, Skeleton-based Analysis

Abstract

Automated surveillance is based on the observation of abnormal human behaviors in video streams. In practice, this task remains challenging because abnormal events are inherently rare and difficult to capture in large-scale annotated datasets. Consequently, most supervised approaches depend heavily on labeled instances of anomalies and often exhibit limited generalization in real-world environments, where novel or previously unseen behaviors may arise. To address this weakness, a pose-based anomaly detection system is proposed, which learns patterns of normality human motion. The system does not learn about specific abnormal activities; instead, it learns normal patterns of behaviour and identifies significant deviations as anomalies. Video frames are processed to extract human skeletal key points using the YOLO-Pose model, which predicts 17 body joints per person. Keypoints are normalized with respect to the bounding box coordinates to achieve scale and position invariance. Derived features of spatial and temporal motion the skeleton representation record the movement dynamics and the body posture. The distribution of normal poses is reflected in an Incremental One-Class Support Vector Machine (SGDOneClassSVM), which is trained only on normal samples. The experiments on the ShanghaiTech Campus dataset show that the proposed method achieves a detection accuracy of 91.0% and operates at about 21 frames per second.

References

[1] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” in Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 1975–1981, doi: 10.1109/CVPR.2010.5539872. DOI: https://doi.org/10.1109/CVPR.2010.5539872

[2] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, Jul. 2001, doi: 10.1162/089976601750264965. DOI: https://doi.org/10.1162/089976601750264965

[3] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1,

pp. 172–186, Jan. 2021, doi: 10.1109/TPAMI.2019.2929257. DOI: https://doi.org/10.1109/TPAMI.2019.2929257

[4] R. Morais, V. Le, T. Tran, B. Saha, M. Mansour, and S. Venkatesh, “Learning regularity in skeleton trajectories for anomaly detection in videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 11996–12004, doi: 10.1109/CVPR.2019.01227. DOI: https://doi.org/10.1109/CVPR.2019.01227

[5] V. Losing, B. Hammer, and H. Wersing, “Incremental on-line learning: A review and comparison of state of the art algorithms,” Neurocomputing, vol. 275, pp. 1261–1274, 2018, doi: 10.1016/j.neucom.2017.06.070. DOI: https://doi.org/10.1016/j.neucom.2017.06.084

[6] D. Maji, S. Nagori, M. Mathew, and D. Poddar, “YOLO-Pose: Enhancing YOLO for multi-person pose estimation using object keypoint similarity,” in Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 2022, pp. 2571–2580, doi: 10.1109/CVPRW56347.2022.00288. DOI: https://doi.org/10.1109/CVPRW56347.2022.00297

[7] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for object recognition,” International Journal of Computer Vision, vol. 61, no. 1, pp. 55–79, Jan. 2005, doi: 10.1023/B:VISI.0000042934.15159.49. DOI: https://doi.org/10.1023/B:VISI.0000042934.15159.49

[8] A. Toshev and C. Szegedy, “DeepPose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 2014,

pp. 1653–1660, doi: 10.1109/CVPR.2014.214. DOI: https://doi.org/10.1109/CVPR.2014.214

[9] A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in *European Conference on Computer Vision (ECCV)*, Amsterdam, The Netherlands, 2016, pp. 483–499. DOI: https://doi.org/10.1007/978-3-319-46484-8_29

[10] H. Fang, S. Xie, Y. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2334–2343, doi: 10.1109/ICCV.2017.256. DOI: https://doi.org/10.1109/ICCV.2017.256

[11] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5693–5703, doi: 10.1109/CVPR.2019.00584. DOI: https://doi.org/10.1109/CVPR.2019.00584

[12] M. Hasan, J. Choi, J. Neumann, A. Roy-Chowdhury, and L. Davis, “Learning temporal regularity in video sequences,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 733–742, doi: 10.1109/CVPR.2016.86. DOI: https://doi.org/10.1109/CVPR.2016.86

[13] M. Ravanbakhsh, M. Nabi, E. Sangineto, L. Marcenaro, C. Regazzoni, and N. Sebe, “Training adversarial discriminators for cross-channel abnormal event detection in crowds,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 2017, pp. 189–197, doi: 10.1109/WACV.2017.28. DOI: https://doi.org/10.1109/WACV.2017.28

[14] W. Luo, W. Liu, and S. Gao, “Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection,” Neurocomputing, vol. 444, pp. 332–337, Jul. 2021, doi: 10.1016/j.neucom.2020.07.131. DOI: https://doi.org/10.1016/j.neucom.2019.12.148

[15] S. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning,” Pattern Recognition, vol. 58, pp. 121–134, Oct. 2016, doi: 10.1016/j.patcog.2016.03.028. DOI: https://doi.org/10.1016/j.patcog.2016.03.028

[16] D. Ross, J. Lim, R. Lin, and M. Yang, “Incremental learning for robust visual tracking,” International Journal of Computer Vision, vol. 77, no. 1, pp. 125–141, May 2008, doi: 10.1007/s11263-007-0075-7. DOI: https://doi.org/10.1007/s11263-007-0075-7

[17] G. Zhou, K. Sohn, and H. Lee, “Online incremental feature learning with denoising autoencoders,” in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Canary Islands, 2012, pp. 1453–1461, url: https://proceedings.mlr.press/v22/zhou12b.html.

[18] G. Pang, C. Shen, and A. van den Hengel, “Deep learning for anomaly detection: a review,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021,

pp. 7890–7899, doi: 10.1109/CVPR46437.2021.00780. DOI: https://doi.org/10.1109/CVPR46437.2021.00780

[19] A. Hussain, W. Ullah, N. Khan, Z. A. Khan, H. Yar, and S. W. Baik, “Class-incremental learning network for real-time anomaly recognition in surveillance environments,” Pattern Recognition, vol. 170, p. 112064, Feb. 2026, doi: 10.1016/j.patcog.2025.112064. DOI: https://doi.org/10.1016/j.patcog.2025.112064

[20] W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly detection – A new baseline,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 6536–6545, doi: 10.1109/CVPR.2018.00685. DOI: https://doi.org/10.1109/CVPR.2018.00684