Monday, November 4, 2024

AI Model Inference Costs Will Decline 20% to 30% Per Year

Despite concern over the high capital investment costs in infrastructure to support generative artificial intelligence models, many studies suggest that costs for inference, which should ultimately be the primary on-going costs, should drop over time, as have costs for other computing instances.


Study Title

Date

Publication Venue

Key Conclusions on Cost Declines

Scaling and Efficiency of Deep Learning Models

2019

NeurIPS

Demonstrates how advances in model scaling (larger models running on optimized hardware) lead to inference cost reductions of around 20-30% per year.

The Hardware Lottery

2020

Communications of the ACM

Highlights the role of specialized hardware (GPUs, TPUs) in reducing AI inference costs, estimating a 2x decrease every 1-2 years with hardware evolution.

Efficient Transformers: A Survey

2021

Journal of Machine Learning Research

Describes optimization techniques (such as pruning, quantization) that contribute to cost declines, estimating an average 30-50% drop in inference costs over two years.

The Financial Impact of Transformer Model Scaling

2022

IEEE Transactions on AI

Examines the economic impacts of scaling transformers and shows that large models can reduce costs by ~40% through efficiencies gained in distributed inference and hardware.

Inference Cost Trends in Large AI Deployments

2023

ICML

Finds a 50% reduction in inference costs per year for large-scale deployments, driven by optimizations in distributed computing and custom AI chips.

Beyond Moore’s Law: AI-Specific Hardware Innovations

2023

MIT Technology Review

Discusses how specialized hardware design reduces inference costs by 2-4x every 2 years, shifting from general-purpose GPUs to domain-specific architectures.

Optimizing Inference Workloads: From Data Center to Edge

2024

ArXiv

Analyzes cost reductions from 2020 to 2024 for both data center and edge deployments, concluding that distributed systems and model compression lead to 50% annual cost drops.


The implication is that inference costs should continue to drop. 


Year

Cost per Inference ($)

Cost Decline Compared to Prior Year

2018

1

-

2019

0.8

20%

2020

0.5

37.50%

2021

0.3

40%

2022

0.15

50%

2023

0.08

47%

2024

0.04

50%

2025

0.02

50%


Of course, a trend towards larger models, using more parameters, will run counter to that trend, in terms of model building. Still, AI model-building (training) cost declines over time, because of hardware acceleration, improved algorithms and model design optimization.


Study

Date

Publication Venue

Key Conclusions on Cost Declines

Scaling Neural Networks with Specialized Hardware

2018

NeurIPS

Describes how hardware advances, especially GPUs and early TPUs, helped reduce model-building costs by around 50% annually for larger models compared to CPU-only setups.

Reducing the Cost of Training Deep Learning Models

2019

IEEE Spectrum

Shows a 40% cost reduction for model training per year through advances in parallel computing and early model optimizations such as batch normalization and weight sharing.

The Lottery Ticket Hypothesis

2019

ICLR

Proposes pruning techniques that significantly reduce computational needs, allowing for up to a 2x reduction in training costs for large models without performance loss.

Efficient Training of Transformers with Quantization

2020

ACL

Demonstrates that quantization can cut training costs nearly in half for transformer models by using fewer bits per parameter, making training large models more economical.

Scaling Laws for Neural Language Models

2020

OpenAI Blog

Finds that while model sizes are increasing exponentially, training cost per parameter can be reduced by ~30% annually through more efficient scaling laws and optimized architectures.

AI and Compute: How Models Get Cheaper to Train

2021

MIT Technology Review

Highlights that training cost per model dropped by approximately 50% from 2018 to 2021 due to more efficient GPUs, TPUs, and evolving cloud infrastructures.

Scaling Up with Low Precision and Pruning Techniques

2022

Journal of Machine Learning Research

Examines pruning and low-precision computation, showing that cost reductions of 50-60% are possible for large-scale models by aggressively reducing unnecessary computations.

The Carbon Footprint of Machine Learning Training

2022

Nature Communications

Highlights how reduced training costs, linked to hardware improvements and energy-efficient computing, also lower the environmental impact, with 35% cost reductions per year.

Optimizing AI Model Training in Multi-GPU Systems

2023

ICML

Finds that advanced multi-GPU and TPU systems reduce training costs for models by ~50% annually, even as model sizes grow, through parallelization and memory sharing.

Scaling AI Economically with Distributed Training

2024

ArXiv

Analyzes distributed training techniques that cut training costs nearly in half for large models, balancing model complexity with infrastructure improvements.


AI model creation costs are quite substantial, representing perhaps an order of magnitude more capital intensity than did cloud computing, for example. But capital intensity should decline over time, as do all computing instances. 


Hyperscale Firms See "Clear" AI Revenue Gains

Recent third-quarter financial reports by Meta, Alphabet, Microsoft and Amazon should quell some of the concern about AI contributions to revenue, at least for the hyperscalers making the biggest investments.


Google CEO Sundar Pichai said its investment in AI is paying off in two ways: fueling search engagement and spurring cloud computing revenues. That’s a good example of one way AI will be monetized in many instances: indirectly, as existing products are improved.


Google services revenue--which includes search--grew 13 percent in the third quarter of 2024. Google Cloud, whose revenues grew 35 percent in the third quarter of 2024. 


Separately, Microsoft likewise reported robust AI-linked revenue gains. “Our AI business is on track to surpass an annual revenue run rate of $10 billion next quarter, which will make it the fastest business in our history to reach this milestone,” says Satya Nadella, Microsoft CEO.


Andy Jassy, Amazon CEO, said “our AI business is a multi-billion dollar business that's growing triple-digit percentages year-over-year and is growing three times faster at its stage of evolution than AWS did itself.” 


Nadella points to Azure cloud computing as a service revenue as one component of that growth. But silicon (accelerators, for example); Azure AI; developer tools (GitHub); CoPilot and LinkedIn as AI-linked products with revenue contributions, if perhaps indirect. 


One thing Microsoft does not appear to be doing is renting graphics processor unit compute cycles, as some other cloud computing firms are doing. 


“We're not actually selling raw GPUs for other people to train,” says Nadella. “In fact, that's sort of a business we turn away because we have so much demand on inference.”


“We kind of really are not even participating in most of that (renting GPU compute cycles) because we are literally going to the real demand, which is in the enterprise space or our own products like GitHub Copilot or M365 Copilot,” Nadella says. 


In fact, Microsoft seems to be going the other way, leasing compute cycles and GPU access  from firms such as CoreWeave. 


Google CEO Sundar Pichai said its investment in AI is paying off in two ways: fueling search engagement and spurring cloud computing revenues. That’s a good example of one way AI will be monetized in many instances: indirectly, as existing products are improved.


Also, as expected, the cost of inference has declined dramatically. “Since we first began testing AI Overviews, we have lowered machine cost per query significantly,” said Pichai. “ In 18 months, we reduced cost by more than 90 percent for these queries.”


And though an argument can be made that AI might cannibalize some significant amount of search, Google has found, since AI Overview was introduced, that “strong engagement” leads to “increasing overall search usage and user satisfaction,” Pichai noted. “People are asking longer and more complex questions and exploring a wider range of websites.”


That, in turn, fuels the advertising revenue potential. 


Google Cloud usage to support AI operations also has skyrocketed. “Gemini API calls have grown nearly 40 times  in a six-month period,” Pichai said. 


Likewise, Google Cloud has seen 80 percent growth in BigQuery ML (machine language)  operations over a six-month period, he noted.


Sunday, November 3, 2024

Google AI Monetization: Ads and Cloud Computing as a Service

Google CEO Sundar Pichai said its investment in AI is paying off in two ways: fueling search engagement and spurring cloud computing revenues. That’s a good example of one way AI will be monetized in many instances: indirectly, as existing products are improved.


Google services revenue--which includes search--grew 13 percent in the third quarter of 2024. 


On the other hand, for at least some infrastructure providers, AI will drive usage of cloud computing resources as well, in this case Google Cloud, whose revenues grew 35 percent in the third quarter of 2024.  


Also, as expected, the cost of inference has declined dramatically. “Since we first began testing AI Overviews, we have lowered machine cost per query significantly,” said Pichai. “ In 18 months, we reduced cost by more than 90 percent for these queries.”


And though an argument can be made that AI might cannibalize some significant amount of search, Google has found, since AI Overview was introduced, that “strong engagement” leads to “increasing overall search usage and user satisfaction,” Pichai noted. “People are asking longer and more complex questions and exploring a wider range of websites.”


That, in turn, fuels the advertising revenue potential. 


Google Cloud usage to support AI operations also has skyrocketed. “Gemini API calls have grown nearly 40 times  in a six-month period,” Pichai said. 


Likewise, Google Cloud has seen 80 percent growth in BigQuery ML (machine language)  operations over a six-month period, he noted. 


AI capital investment levels will remain an issue for some time, given the huge leap in capex for AI infrastructure, models and inference that happened in 2023 and has continued into 2024. Google itself projects an increase in AI capex spending for 2025, as well. 


Some idea of the ramp up of investment can be seen in venture capital investments alone, and excluding investments by leading firms such as Google, Microsoft, Meta, Apple and Amazon, to support generative and other forms of AI. 


Year

Estimated VC Investment (Billions USD)

2020

0.2

2021

1.2

2022

2.7

2023

22.4

2024 (projected)

30+


Those figures do not include any sums spent by enterprises, software or hardware firms to create AI features, apps or platforms. Nor doe those amounts include investment by hyperscale app providers or  device firms to add AI features to their existing products. 

source: Our World in Data 


The big takeaway from Alphabet’s most-recent earnings call is that significant revenue attributable at least in part to AI investments has been seen.


Directv-Dish Merger Fails

Directv’’s termination of its deal to merge with EchoStar, apparently because EchoStar bondholders did not approve, means EchoStar continue...