Thursday, April 3, 2025

Are Large Language Models Really "10 Times" More Energy Consumptive than Search?

Most of us have heard claims that a single chatbot (Large Language Model or generative AI system) query is significantly more energy-intensive (often cited as roughly 10 times more) than a traditional search query.


Most of us could agree that the statement about energy intensity is directionally correct for most systems at the present time, though perhaps not as big a long-term issue, as energy intensity is virtually certain to be reduced over time. 


Computational complexity obviously is an issue. Traditional search uses pre-computed indexes. Much of the “heavy lifting" (indexing the web) is done beforehand.


Large language models run a generative process through a massive neural network (often with billions or trillions of parameters). Each query requires significant computations to understand a prompt and generate a novel response. This "inference" process is inherently more computationally demanding per query than retrieving indexed information.  


Early energy estimates suggested a "10x" more energy metric. These estimates looked at computational operations (FLOPs - Floating Point Operations per Second) required for each type of task and translated that into potential energy use based on typical hardware efficiency.


But that probably already is an out-of-date way to make the comparisons. As search engines increasingly integrate generative AI into search, the difference between an LLM query and a search query is likely narrowing quite substantially, in terms of energy consumption. 


Study/Source

Year

Model(s) Analyzed (Examples)

Key Finding / Estimate per Query

Context / Notes

Luccionni, Viguier, & Ligozat (NeurIPS 2023 - originally arXiv 2022)

2022/2023

BLOOM (176B parameters)

Estimated inference energy consumption for BLOOM, varying significantly based on hardware (e.g., A100 vs. T4 GPUs). Provided methodology for carbon footprint calculation.

Focused on BLOOM, an open model. Emphasized the impact of hardware and location (electricity grid mix) on the carbon footprint. Didn't give a single universal Wh/query figure.

Patterson et al. (Google Research) (arXiv 2021)

2021

LaMDA, MUM (Conceptual / Internal Google Models)

Not a direct per-query energy figure, but stated "some models used by Search are already large," and newer AI features (like MUM) are more compute-intensive.

Context was broader discussion of model efficiency and training costs. Confirms Google's internal view that advanced AI features increase computational demands over basic search.

De Vries (Digiconomist) (Joule, 2023 & ongoing analysis)

2023

General LLMs (e.g., based on ChatGPT/GPT-3 scale)

Estimated a single ChatGPT query could consume ~0.001-0.01 kWh (1-10 Wh) on average, potentially much higher depending on complexity & hardware. Compared this to a Google search (~0.0003 kWh or 0.3 Wh).

Estimates based on assumed hardware (like Nvidia A100 GPUs), server power usage, and query processing time. Acknowledges high uncertainty. Helped popularize the ~10x search comparison.

Gupta et al. (Stanford HAI) (Working Paper / Estimates)

2023

Conceptual LLM (e.g., GPT-3 scale)

Estimated generating a single image with a diffusion model might consume as much energy as charging a smartphone. Extrapolated that text generation is also energy-intensive.

Focused partially on image generation AI but discussed text AI costs. Used comparisons to relatable actions (phone charging) to illustrate magnitude. Emphasized inference costs add up globally.

Google Public Statements / Reports (Various)

Ongoing

Google's AI Services (incl. Search Generative Experience)

Repeatedly stated that generative AI queries are more computationally intensive and thus consume more energy than traditional search queries. No specific public Wh/query figure released.

Confirms the general premise from the provider's side. Focuses on efforts to improve efficiency via hardware (TPUs) and software optimization.

University Research (Various) (e.g., studies citing FLOPs)

Ongoing

Various (BERT, GPT variants, etc.)

Often estimate FLOPs (Floating Point Operations) per query/token. E.g., a query might require trillions of FLOPs. This can be converted to energy using hardware efficiency (Joules/FLOP), leading to estimates often in the 0.1 Wh to 10 Wh range depending on assumptions.

These are often theoretical calculations based on model architecture and assumed hardware specs (e.g., Joules per FLOP for a specific GPU). Highly variable.


Also, models are becoming more energy efficient, as tends to happen with all computing processes that become more mature. 


So at this point, we really do not know much about energy consumption, except that, on today’s hardware, using today’s algorithms and compute intensity, it is logical enough to believe that more energy is required, as more computation is required. 


Still, logic also suggests that simple queries will require less computation, and therefore less energy. 

A simple classification task, retrieving a cached answer, or generating a very short response using a smaller, specialized model might have an energy cost that isn't dramatically higher than a complex search operation.


But actual consumption is certain to vary by model, by model architecture, by data center and hardware platforms. And since no “operating at scale” AI “as a service” supplier seems to have released any actual studies on the subject, we might assume they already know the energy consumption increase is significant.


No comments:

AI Assistant Revenue Upside Mostly Will be Measured Indirectly

Amazon expects Rufus , its AI shopping assistant, to indirectly contribute over $700 million in operating profits this year, Business Intel...