Showing posts sorted by date for query Moore's Law. Sort by relevance Show all posts
Showing posts sorted by date for query Moore's Law. Sort by relevance Show all posts

Monday, November 4, 2024

AI Model Inference Costs Will Decline 20% to 30% Per Year

Despite concern over the high capital investment costs in infrastructure to support generative artificial intelligence models, many studies suggest that costs for inference, which should ultimately be the primary on-going costs, should drop over time, as have costs for other computing instances.


Study Title

Date

Publication Venue

Key Conclusions on Cost Declines

Scaling and Efficiency of Deep Learning Models

2019

NeurIPS

Demonstrates how advances in model scaling (larger models running on optimized hardware) lead to inference cost reductions of around 20-30% per year.

The Hardware Lottery

2020

Communications of the ACM

Highlights the role of specialized hardware (GPUs, TPUs) in reducing AI inference costs, estimating a 2x decrease every 1-2 years with hardware evolution.

Efficient Transformers: A Survey

2021

Journal of Machine Learning Research

Describes optimization techniques (such as pruning, quantization) that contribute to cost declines, estimating an average 30-50% drop in inference costs over two years.

The Financial Impact of Transformer Model Scaling

2022

IEEE Transactions on AI

Examines the economic impacts of scaling transformers and shows that large models can reduce costs by ~40% through efficiencies gained in distributed inference and hardware.

Inference Cost Trends in Large AI Deployments

2023

ICML

Finds a 50% reduction in inference costs per year for large-scale deployments, driven by optimizations in distributed computing and custom AI chips.

Beyond Moore’s Law: AI-Specific Hardware Innovations

2023

MIT Technology Review

Discusses how specialized hardware design reduces inference costs by 2-4x every 2 years, shifting from general-purpose GPUs to domain-specific architectures.

Optimizing Inference Workloads: From Data Center to Edge

2024

ArXiv

Analyzes cost reductions from 2020 to 2024 for both data center and edge deployments, concluding that distributed systems and model compression lead to 50% annual cost drops.


The implication is that inference costs should continue to drop. 


Year

Cost per Inference ($)

Cost Decline Compared to Prior Year

2018

1

-

2019

0.8

20%

2020

0.5

37.50%

2021

0.3

40%

2022

0.15

50%

2023

0.08

47%

2024

0.04

50%

2025

0.02

50%


Of course, a trend towards larger models, using more parameters, will run counter to that trend, in terms of model building. Still, AI model-building (training) cost declines over time, because of hardware acceleration, improved algorithms and model design optimization.


Study

Date

Publication Venue

Key Conclusions on Cost Declines

Scaling Neural Networks with Specialized Hardware

2018

NeurIPS

Describes how hardware advances, especially GPUs and early TPUs, helped reduce model-building costs by around 50% annually for larger models compared to CPU-only setups.

Reducing the Cost of Training Deep Learning Models

2019

IEEE Spectrum

Shows a 40% cost reduction for model training per year through advances in parallel computing and early model optimizations such as batch normalization and weight sharing.

The Lottery Ticket Hypothesis

2019

ICLR

Proposes pruning techniques that significantly reduce computational needs, allowing for up to a 2x reduction in training costs for large models without performance loss.

Efficient Training of Transformers with Quantization

2020

ACL

Demonstrates that quantization can cut training costs nearly in half for transformer models by using fewer bits per parameter, making training large models more economical.

Scaling Laws for Neural Language Models

2020

OpenAI Blog

Finds that while model sizes are increasing exponentially, training cost per parameter can be reduced by ~30% annually through more efficient scaling laws and optimized architectures.

AI and Compute: How Models Get Cheaper to Train

2021

MIT Technology Review

Highlights that training cost per model dropped by approximately 50% from 2018 to 2021 due to more efficient GPUs, TPUs, and evolving cloud infrastructures.

Scaling Up with Low Precision and Pruning Techniques

2022

Journal of Machine Learning Research

Examines pruning and low-precision computation, showing that cost reductions of 50-60% are possible for large-scale models by aggressively reducing unnecessary computations.

The Carbon Footprint of Machine Learning Training

2022

Nature Communications

Highlights how reduced training costs, linked to hardware improvements and energy-efficient computing, also lower the environmental impact, with 35% cost reductions per year.

Optimizing AI Model Training in Multi-GPU Systems

2023

ICML

Finds that advanced multi-GPU and TPU systems reduce training costs for models by ~50% annually, even as model sizes grow, through parallelization and memory sharing.

Scaling AI Economically with Distributed Training

2024

ArXiv

Analyzes distributed training techniques that cut training costs nearly in half for large models, balancing model complexity with infrastructure improvements.


AI model creation costs are quite substantial, representing perhaps an order of magnitude more capital intensity than did cloud computing, for example. But capital intensity should decline over time, as do all computing instances. 


Wednesday, October 23, 2024

Will AI Have Impact More Like the PC or the Internet? Why it Matters

One reason it is conceptually hard to imagine the impact of artificial intelligence is that it is likely to have business impact along the same lines as did Moore’s Law or the internet: removing key cost barriers and enabling new business models. 


And though some outcomes are easy to envision, such as automating functions or removing geographic barriers, others are hard to grasp because they simply did not exist before. Search and social media are examples. 


In other words, as Moore’s Law led to the elimination of key constraints regarding the cost of computing and software, while the internet created new possibilities for product distribution and sales,, AI might well eliminate key barriers in a value chain.


That will allow lots of industries to evolve in ways that were not possible before, and possibly also create a few new industries that had not existed previously, as the search and social media businesses emerged with completely-new business models (ad supported technology and user-generated content). 


The way to think about it is to ask, in the context of any business, process or industry, what could be different if the key cost constraint, or a major cost constraint, were reduced to a point where it no longer was a constraint or barrier. .


In other words, the question is something like “what would my business look like if a key input were nearly free?” 


Perhaps the best example is Netflix. It is not entirely clear whether Netflix founder Reed Hastings initially and “always” thought the company would evolve into a video streaming service, but it is clear that he did believe a “deliver your DVDs by mail” service was viable in 1997. 


According to Barry McCarthy (Netflix's CFO from 1999 to 2010) and Neil Hunt (Netflix's Chief Product Officer from 1999 to 2017), they were at a 2005 dinner with Reed Hastings where they sketched out projections of bandwidth costs and speeds on a napkin. They plotted Moore's Law-like curves showing:

  • Internet speeds would keep increasing

  • Video compression technology would improve

  • The cost of bandwidth would continue falling


The key insight from their napkin math was that these trends would intersect at a point where streaming video would become economically viable for a mass market service. Netflix launched video streaming in 2007. 


So think of the ways AI might eventually remove key cost constraints in many industries, as the internet eliminated barriers in retailing.


Retailer Cost Constraint

Traditional Retail

Internet Retail

Inventory Costs

High costs associated with maintaining physical inventory, including storage, handling, and obsolescence

Reduced inventory needs due to drop-shipping models and virtual warehouses, leading to lower storage and handling costs

Real Estate Costs

High costs for physical store locations, including rent, utilities, and maintenance

Lower costs associated with online stores, as they require minimal physical space

Distribution Costs

High costs for shipping and transportation of products to physical stores

Lower costs for shipping directly to customers, especially for digital products

Marketing Costs

High costs for traditional advertising methods, such as print, television, and radio

Lower costs for online marketing, including search engine optimization, social media, and email marketing

Customer Service Costs

High costs for in-store customer service, including staffing and training

Lower costs for online customer service, often automated or outsourced


And we can note many similar constraint removals in other industries, including the creation of entirely-new business and revenue models for search and social media. Both search and social media were examples of “advertising-supported technology” models, something that had not been conceivable or possible before. 


But the internet also enabled a rearrangement of business models in most industries, often focused heavily on distribution methods. 


Industry

Traditional Cost Barriers

Internet Solutions

Retail

High overhead costs (rent, utilities), inventory management, distribution

E-commerce platforms, drop-shipping, digital products

Media

Printing costs, distribution logistics, limited reach

Online publishing, streaming services, social media

Software

Physical distribution, licensing costs

Digital distribution, SaaS models, open-source software

Education

Infrastructure costs, geographical limitations

Online courses, MOOCs, virtual classrooms

Finance

Branch network costs, transaction fees

Online banking, mobile payments, cryptocurrency

Travel

Agency fees, booking limitations

Online travel agencies, direct bookings, peer-to-peer platforms

Entertainment

Production costs, distribution channels

Digital content creation, streaming platforms, social media

Manufacturing

Supply chain costs, inventory management

3D printing, on-demand manufacturing, global sourcing

Customer Service

Infrastructure costs, geographical limitations

Online help desks, chatbots, AI-powered support

Professional Services

Geographical limitations, overhead costs

Remote work, online collaboration tools, freelance platforms


Consider the importance of Moore’s Law for the software industry’s “forward pricing” of its products.


Forward pricing is a strategy of setting prices for current products based on anticipated future costs and market conditions, rather than just current costs. 


Microsoft in the 1980s and 1990s, for example, is said to have deliberately released new products that both required more-powerful hardware and also with the expectation that the hardware would catch up. 


In the gaming Industry, products often were designed around advanced hardware that had not yet become mainstream, assuming that would happen and that costs for the platforms would drop. 


Suppliers of enterprise software arguably made the same assumptions, building features that required better hardware and platform upgrades.


On the other hand, initial high prices were expected to fall rapidly, creating the potential for mass market adoption though initially focusing on early adopters. 


The key issue at the moment is that it is very hard to conceive of entirely new ways an existing industry can innovate using AI, to revamp its value chains. It arguably is even harder to envision the emergence of at least a few entirely-new industries that do not presently exist. 


The personal computer and the internet have enabled the emergence of entirely industries or industry segments. For example, the independent software industry was enabled by the PC, along with lots of “PC-specific” industry functions. 


The internet arguably has had more-profound impact, enabling e-commerce, social media, search, cloud computing, digital advertising and streaming media. 


Personal Computer

Internet

PC Manufacturing

E-commerce

Operating Systems

Social Media

PC Software

Cloud Computing

Computer Peripherals

Digital Advertising

PC Gaming

Streaming Media

Desktop Publishing

Online Education

Computer-Aided Design (CAD)

Cybersecurity

PC Repair Services

Web Hosting

PC Retail

Search Engines

PC Magazines/Media

Digital Payment Systems


That should raise questions about the potential AI impact: will it mostly create new industry sub-sectors that support the use of AI itself, as did much of the PC ecosystem, or will it transform whole functions and industries, as arguably was the case for the internet?


Directv-Dish Merger Fails

Directv’’s termination of its deal to merge with EchoStar, apparently because EchoStar bondholders did not approve, means EchoStar continue...