AI-generated podcast of this content
The cost of acquiring and using a generative artificial intelligence model matters, both for model suppliers and users of such models, as is true for any technology. That might be especially important now, in the early days of deployment, as end users remain unsure about return on investment.
Strategically, one might also argue that the cost-benefit of GenAI has to eventually resemble the cost-benefit and economics of the internet to succeed. Namely, GenAI has to become a low-cost solution for high-cost problems.
In other words, the internet has proven so disruptive and useful because it provided low-cost solutions for high-cost problems. So far, the issue with generative AI has been that it often seems a high-cost solution for lower-value problems. And that is not a surefire recipe for success.
To be sure, we will move up the experience curve, and GenAI costs will drop. All that suggests we eventually will discover ways to leverage GenAI in a low-cost way to solve high-cost problems. The best precedent is the internet, as a platform.
The internet dramatically lowered the costs of communication and information sharing across distances. Tasks that previously required expensive long-distance phone calls, postal mail, or in-person meetings could now be done instantly and cheaply using, text messages, app messages, email, file sharing, videoconferencing and so forth.
The low-cost infrastructure of the internet allowed new types of businesses to emerge that would not have been viable before, including wide area or global e-commerce, digital content distribution, online advertising, and software and content distributed by virtual networks rather than physical media.
Also, the internet made vast amounts of information freely accessible that was previously locked behind high-cost barriers like libraries, academic institutions, or proprietary databases. This dramatically reduced the cost of learning and research for individuals and organizations.
Many costs for creating or running businesses also were reduced.
Tools such as wikis, open source software, and cloud computing allowed large-scale collaboration and resource sharing at very low marginal costs, enabling new forms of innovation and problem-solving.
The internet also reduced the capital costs required to start and scale many types of businesses.
Online marketplaces and platforms dramatically reduced search and transaction costs for buyers and sellers across many industries as well. So many manual, labor-intensive processes could be automated.
The key insight is that by providing a standardized, open platform with very low marginal costs, the internet enabled solutions to problems and inefficiencies across many domains that were previously prohibitively expensive.
To have the expected impact, GenAI will have to move in those directions as well. It will have to attack the cost basis for lots of business processes, and do so at much-lower cost.
But it is a safe prediction that the costs of acquiring use of a large language model; training them and generating inferences will drop over time, as tends to be the rule for any computing-driven use case. And that matters as generative artificial intelligence is the top AI solution deployed in organizations, according to a new survey by Gartner.
According to a Gartner survey conducted in the fourth quarter of 2023, 29% of the 644 respondents from organizations in the U.S., Germany and the U.K. said that they have deployed and are using GenAI, making GenAI the most frequently deployed AI solution. GenAI was found to be more common than other solutions like graph techniques, optimization algorithms, rule-based systems, natural language processing and other types of machine learning.
The survey also found that utilizing GenAI embedded in existing applications (such as Microsoft’s Copilot for 365 or Adobe Firefly) is the top way to fulfill GenAI use cases, with 34% of respondents saying this is their primary method of using GenAI. This was found to be more common than other options such as customizing GenAI models with prompt engineering (25 percent), training or fine-tuning bespoke GenAI models (21 percent), or using standalone GenAI tools, like ChatGPT or Gemini (19 percent).
Activity | 2020 Cost (cents/1000 tokens) | 2024 Cost (cents/1000 tokens) | Study | Date | Publisher | Key Conclusions |
Creating LLMs | 5,333 - 106,667 | 602 | "Large language model" | 2024 | Wikipedia | Training costs have decreased significantly since 2020. In 2020, a 1.5B parameter model cost $80K-$1.6M, while in 2023, a 12B parameter model costs about $120K |
Modifying (Fine-tuning) | N/A | 60 | "Breaking Down the Cost of Large Language Models" | 2024 | Qwak | Fine-tuning costs are generally lower than training from scratch, but still significant |
Using (Inference) - GPT-3 | 60 (output) | 20 (output) | "Breaking Down the Cost of AI for Organizations" | 2024 | TensorOps | Inference costs have decreased, with GPT-3.5 being cheaper than earlier versions |
Using (Inference) - Claude | N/A | 1500 (output) | "Breaking Down the Cost of Large Language Models" | 2024 | Qwak | More advanced models like Claude Opus have higher inference costs |
In a pre-training scenario involving a model with 70 billion parameters, using YaFSDP can save the resources of approximately 150 GPUs, says Yandex. This translates to potential monthly savings of roughly $0.5 to $1.5 million, depending on the virtual GPU provider or platform.
But innovations including architecture; hardware acceleration; model size; algorithms; open source- and training methods all will contribute to reducing the cost of creating and using large language models.
Innovation | Study | Date | Publisher | Key Conclusions |
Efficient Training Algorithms | "Chinchilla: Training Language Models with Compute-Optimal Scale" | Mar 2022 | DeepMind | Smaller models trained on more data can match performance of larger models, reducing compute costs |
Hardware Acceleration | "A Survey on Hardware Accelerators for Large Language Models" | Jan 2024 | arXiv | Custom hardware like GPUs, FPGAs and ASICs can significantly improve LLM performance and energy efficiency |
Model Compression | "LLM in a flash: Efficient Large Language Model Inference with Limited Memory" | Dec 2023 | arXiv | Techniques like quantization and pruning can reduce model size and memory requirements without major performance loss |
Sparse Models | "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts" | Dec 2021 | Google | Sparse mixture-of-experts models can be more parameter efficient than dense models |
Distributed Training | "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism" | Sep 2019 | NVIDIA | Techniques for efficiently training very large models across multiple GPUs/nodes |
Few-Shot Learning | "Language Models are Few-Shot Learners" | May 2020 | OpenAI | Large models can perform well on new tasks with just a few examples, reducing task-specific training data needs |
Open Source Models | "OPT: Open Pre-trained Transformer Language Models" | May 2022 | Meta AI | Open sourcing large models enables wider research and reduces duplication of training efforts |
Efficient Architectures | "Efficient Transformers: A Survey" | Dec 2020 | arXiv | Architectural innovations like sparse attention can improve efficiency of transformer models |
----------------