Carbon Prompting: An Empirical Analysis of LLM-Based Requirement Classification
Keywords:
Carbon emissions, Chain-of-Thought (CoT), CodeCarbon, Large Language Models (LLMs), prompt engineering, PROMISE datasetAbstract
The growing use of Large Language Models (LLMs) in real-world applications has raised new questions about their environmental impact, especially during inference. While much research has focused on the energy demands of model training, this study draws attention to the often-overlooked emissions generated when these models are used at scale. We introduce the idea of carbon prompting, which examines how different prompt designs influence the energy use and carbon output of LLM inference. Using the PROMISE dataset for classifying functional and non-functional requirements, we tested nine prompting strategies with the LLaMA 3.2 model, including zero-shot, few-shot, Chain-of-Thought, detailed, JSON-based, self-critique, expert-engineer, instruction-few-shot, and no-explanation formats. Energy consumption and CO₂ emissions were tracked using CodeCarbon, allowing a detailed comparison of performance and environmental cost. The results show that simpler prompts-particularly no-explanation and zero-shot achieved the best balance between accuracy and energy efficiency. More elaborate prompts, although designed to elicit deeper reasoning, produced higher emissions without meaningful accuracy gains due to unnecessary token generation. These findings emphasize that prompt design should consider not only model performance but also sustainability, positioning environmental impact as a key metric in future AI development and evaluation
Downloads
Published
Conference Proceedings Volume
Section
License
Copyright (c) 2026 DMPedia Lecture Notes in Multidisciplinary Research

This work is licensed under a Creative Commons Attribution 4.0 International License.