Beyond Inference: Exploratory Perspectives on Leveraging NPUs for Scheduling and Optimization

Authors

  • Mrigansh Chaudhary Department of Computer Science and Engineering, Graphic Era Deemed to Be University, Clement Town, Dehradun (Uttrakhand)-248002 Author
  • Kartikay Srivastava Department of Computer Science and Engineering, Graphic Era Deemed to Be University, Clement Town, Dehradun (Uttrakhand)-248002 Author
  • Utkarsh Pant Department of Computer Science and Engineering, Graphic Era Deemed to Be University, Clement Town, Dehradun (Uttrakhand)-248002 Author

Keywords:

Neural Processing Unit, Scheduling, Optimization, CPU–NPU Cooperation, Resource Management

Abstract


Neural Processing Units (NPUs) are beginning to become integral components in modern computing systems, primarily by accelerating inference workloads in areas such as computer vision, natural language processing, and recommendation systems. Despite this increasing prevalence, NPUs remain underutilized in many deployments, often wasting energy while sitting idle outside of inference contexts. At the same time, CPU scheduling continues to face challenges in balancing throughput, latency, efficiency, and energy consumption under highly variable workloads. Current scheduling paradigms primarily focus on balancing computational throughput and minimizing latency. For example, the Linux kernel recently adopted the Earliest Eligible Virtual Deadline First (EEVDF) scheduler to better address latency handling, building on the earlier Completely Fair Scheduler (CFS), which emphasized fairness. Yet, in real-world scenarios, workload variance is often high, and decisions such as prioritizing latency versus throughput versus efficiency are typically made statically, based on expected conditions rather than actual runtime demands. As a result, these scheduling parameters rarely adapt in real time to the effects of dynamic workloads and real-world usage patterns.

Together, these observations highlight an opportunity to explore the potential of NPUs beyond traditional inference acceleration. By leveraging their parallel processing capabilities and low-power characteristics, NPUs could assist or even take a more active role in dynamic scheduling, enabling adaptive resource management that responds to real-time workload demands. This paper presents an exploratory framework for investigating the integration of NPUs into scheduling. Rather than reporting empirical results, it outlines conceptual pathways and architectural possibilities.

Downloads

Published

13-03-2026

How to Cite

Chaudhary, M. ., Srivastava, K. ., & Pant , U. . (2026). Beyond Inference: Exploratory Perspectives on Leveraging NPUs for Scheduling and Optimization. DMPedia Lecture Notes in Multidisciplinary Research, IMPACT26, 219-225. https://digitalmanuscriptpedia.com/conferences/index.php/DMP-LNMR/article/view/71