Small Changes in AI Models Can Yield Big Energy Savings


Small changes in the large language models (LLMs) at the heart of AI applications can result in substantial energy savings, according to a report released by the United Nations Educational, Scientific and Cultural Organization (UNESCO) on Monday.

The 35-page report titled “Smarter, smaller, stronger: resource-efficient generative AI & the future of digital transformation” outlines three ways AI developers and users can reduce the power gluttony of the technology.

1. Use smaller models.

Smaller models are just as smart and accurate as large ones, according to the report. Small models tailored to specific tasks can cut energy use by up to 90%, the report maintained.

Currently, users rely on large, general-purpose models for all their needs, it explained. Research shows that using smaller models tailored to specific tasks — like translation or summarization — can cut energy use significantly without losing performance. It’s a smarter, more cost- and resource-efficient approach, it continued, matching the right model to the right job, rather than relying on one large, all-purpose system for everything.

What’s more, energy-efficient, small models are more accessible in low-resource environments with limited connectivity, offer faster response times, and are more cost-effective.

2. Use shorter prompts and responses.

Streamlining input queries and response lengths can reduce energy use by over 50%, the report noted. It added that shortening inputs and outputs also reduces the cost of running LLMs.

3. Use compression to shrink the size of the model.

Model compression techniques, such as quantization, can achieve energy savings of up to 44% by reducing computational complexity, the report explained. It also reduces the cost of running LLMs by shrinking their size and making them faster.

Why Smaller Models Use Less Energy

Smaller AI models consume less energy because they have less work to do. “Smaller AI models — what we call small language models — require fewer parameters, less memory, and significantly less GPU throughput,” explained Jim Olsen, CTO of ModelOp, a governance software company, in Chicago.

“That means lower power consumption during both training and inference,” he told TechNewsWorld. “You’re not running billions of operations per token. You’re optimizing for precision in a tighter domain, which leads to more sustainable compute costs.”

Larger models have exponentially more parameters than smaller models, so each time a model is asked a question, it has to perform mathematical calculations across all its parameters to generate an answer.

“More parameters mean more calculations, which require more processing power from the GPUs and, therefore, consume more energy,” said Wyatt Mayham, head of AI consulting at Northwest AI Consulting (NAIC), a global provider of AI consulting services.

“It’s the digital equivalent of a V8 engine burning more gas than a four-cylinder, even when just idling,” he told TechNewsWorld. “A smaller, more specialized model simply has less computational overhead for each task.”

Sagar Indurkhya, chief scientist at Virtualitics, an AI-powered analytics company, in Pasadena, Calif., contended that while smaller LLMs typically do not perform as well as larger or frontier models, it is possible to fine-tune small LLMs on specific relevant data, such as proprietary data that cannot be shared outside a company, so that the tuned model’s performance on very specific tasks is competitive with that of frontier models.

However, he also told TechNewsWorld, “If the goal is reducing power consumption for AI agents, use and adaptation of smaller LLMs is a path forward any company should carefully consider.”

Cutting Chatty Prompts Saves Energy

Although AI models are often referred to as chatbots, it doesn’t pay to be chatty with the AI. “The model understands your intent,” said Mel Morris, CEO of Corpora.ai, maker of an AI search engine, in Derby, England.

“It doesn’t need pleasantries,” he told TechNewsWorld. “It doesn’t really want them. It doesn’t do it any good, but it has to pass those additional words to its model, and that costs compute time.”

Ian Holmes, director and global lead for enterprise fraud solutions at SAS, a software company that specializes in analytics, artificial intelligence, and data management solutions, in Cary, N.C., agreed that prompt brevity can be an energy saver. “It can be potentially quite impactful in reducing the overall energy footprint of AI interactions,” he told TechNewsWorld. “The more unnecessarily complex a prompt is, the more computational power will be required for the LLM to interpret and respond.”

“It’s easy to treat an LLM like a knowledgeable friend, engaging in long, chatty exchanges, but this can unintentionally increase the model’s workload,” he said. “Keeping prompts concise and focused helps reduce the amount of data the model needs to process. That, in turn, can lower the compute power required to generate a response.”

Shorter prompts, however, are not always practical. “Many prompts contain unnecessary context or examples that could be trimmed,” acknowledged Charles Yeomans, CEO and co-founder of AutoBeam, a data compaction and transmission optimization company, in Moraga, Calif.

“However, some tasks inherently require detailed prompts for accuracy,” he told TechNewsWorld. “The key is eliminating redundancy, not sacrificing necessary information.”

There can be a trade-off when it comes to shorter prompts, added Axel Abulafia, chief business officer with CloudX, a software engineering and AI solutions company in Manalapan, N.J. “Smaller prompts are better on paper, but if the error rate of these prompts is double or triple versus a prompt that is only 50% larger, then the equation is clear,” he told TechNewsWorld. “I’d say that smarter prompts can save much more energy than only smaller ones.”

The challenge lies in maintaining quality, added NAIC’s Mayham. “A prompt that is too brief may lack the necessary context for the model to provide a useful or accurate response,” he said. “Likewise, forcing a response to be artificially short might strip it of important nuance.”

“It becomes a balancing act for developers,” he continued. “They need to design prompts that are concise yet contextually rich enough to get the job done. For many routine tasks, this is achievable, but for complex problem-solving, longer and more detailed interactions are often unavoidable.”

Risks and Rewards of Model Compression

UNESCO’s call for shrinking models can have drawbacks, too. “The primary risk is that you can compress a model too much and harm its performance,” Mayham noted. “Overly aggressive pruning or quantization can lead to a drop in accuracy, logical reasoning ability, or nuance, which might make the model unsuitable for its intended purpose. There’s a delicate balance between efficiency and capability.”

In addition, he continued, implementing compression techniques effectively requires deep technical expertise and significant experimentation. “It’s not a one-size-fits-all solution,” he said. “The right compression strategy depends on the specific model architecture and the target application. This can be a high barrier for teams without specialized AI/ML engineering talent.”

The key to reducing AI energy consumption is combining multiple optimizations — smaller models, compression, efficient prompting, better hardware utilization — to multiply savings, maintained AutoBeam’s Yeomans.

“Also consider caching common responses and using specialized models for specific tasks,” he said, “rather than general-purpose LLMs for everything.”

“Even if it is tempting to always throw LLMs at every problem, a good rule of thumb is that solutions should go from simple to complex,” added CloudX’s Abulafia. “There are many problems that can be solved using tried-and-true algorithms. You can use those as baselines and grow in complexity from there. First to smaller fine-tuned models, and only then to large models. Always working smart and realizing that bigger is not always better.”


Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment